790.5 

P963 

1984 


MENTAL 

HEALTH 

SERVICE 

SYSTEM 

REPORTS 


SERIES  BN: 

Needs  Assessment 
and  Evaluation 


U.S.  DEPARTMENT  OF 
HEALTH  AND  HUMAN  SERVICES 
Public  Health  Service 
Alcohol,  Drug  Abuse,  and 
Mental  Health  Administration 
National  Institute  of  Mental  Health 


Series  BN  No.  5 


Program 

Performance  Measurement; 
Demands,  Technology, 
and  Dangers 


I 

» , 


J'a 


Program 

PerformaKce  Measurement: 
Demands,  Technology, 
and  Dangers 


Edited  by 

Charles  Windle,  Ph.D. 

Division  of  Biometry  and  Epidemiology 


U.S.  DEPARTMENT  OF  HEALTH  AND  HUMAN  SERVICES 
Public  Health  Service 

Alcohol,  Drug  Abuse,  and  Mental  Health  Administration 

National  Institute  of  Mental  Health 
Division  of  Biometry  and  Epidemiology 
5600  Fishers  Lane 
Rockville,  Maryland  20857 


/p/^ 

?9(^3 

/9Si 


The  editing  of  this  monograph  was  done  as  a part  of  official  Govern- 
ment duties  at  the  National  Institute  of  Mental  Health.  However,  the 
opinions  expressed  herein  are  the  views  of  the  authors  and  do  not  nec- 
ess2u*ily  reflect  the  official  position  of  the  Institute  or  any  other  part  of 
the  U.S.  Depautment  of  Health  and  Human  Services. 

The  editor  is  indebted  to  Robert  Perloff  of  the  Graduate  School  of 
Business,  University  of  Pittsburgh  for  his  suggestions  based  on  an  early 
review  of  this  book. 

The  quotes,  articles,  or  measures  cited  in  this  report  as  copyrighted  are 
reproduced  herein  with  permission  of  the  copyright  holders.  Further  re- 
production of  these  copyrighted  materials  is  prohibited  without  specific 
permission  of  the  copjrright  holders.  All  other  material  contained  in  this 
volume  is  in  the  public  domain  and  may  be  used  or  reproduced  without 
permission  of  the  Institute,  the  authors,  or  other  DHHS  agencies.  Ci- 
tation of  the  source  is  appreciated. 


Suggested  Citation 

National  Institute  of  Mental  Health.  Series  BN  No.  5,  Program  Per- 
formance Measurement:  Demands,  Technology,  and  Dangers,  Windle, 
C.,  ed.  DHHS  Pub.  No.  (ADM)84-1357.  Washington,  D.C.:  Supt.  of 
Docs.,  U.S.  Govt.  Print.  Off.,  1984. 


DHHS  Publication  No.  (ADM)84-1357 
Printed  1984 


flrtJ  UaioM  Io  «ti 

lo  t ^ i^uarA  iWiim  >ittf 


m'‘S^i:^ASSs^is:^Wj£s 


4:>^.mlc^  tmo  ^0^  ^fom»  wt*  > 

^ .nc^ 

' ^'^■'  ;■'  * *«:»  i 


’■j» 

♦t  D *>iC 


. ■ OM]  u -'It  '■  t 

'^"*»yyfeT 


!i“ 


„W  f»hr«?  J^Ji4 


.OQj 


>&0|  . T>0 


' .r<^  >!#»' 


J^i^l  , 

1-  i-,j 
: ' ..j|u  .«>, 

|l'H?ji  '*i>4 


*1*;^ 


^ :i^.  -.*1,-1 

-,. r m 

\’-  ^^  .■'‘  ift  s^d'  Hi.*'  i*. 

;!'V  < , >;  '-».  ,4  visi'i 

■1»  ' ^‘  I * Ytl'  .'  V 

*■•  4-r«i^ . ■ i‘n  d ' » 1^ 
^;*^Jf/tlfc^^vtt,i^»/ e\f  >w  ':  -:w>»» 
' 'jf*'*  ^ d'  "ipvf^'.y^ 


^ • MNrtfW  ■ J,feto  X' 

ri^'-  ‘^'V  ■■■^■■' 


. "'  , '<! 


'■i^i:r  ’3'  ^’’.tiljl'  ;«f\,r/v.i»,';A^-.  1 

■1*  .V'<^’  - 

lul^  ■ . .:.i**'J 

,’*i.L'  -it-;  5 


■W 


rsii  auKi 

hUlthn  . 

, . ’■''« 

.'tf 


Lkl-i^  Via'Ni' 


Foreword 


Quantitative  measurement  hzis  become  an  item  of  considerable  interest 
to  the  mental  health  field.  Following  the  lead  of  the  general  health  sys- 
tem, practitioners  increzisingly  look  toward  laboratory  results  to  decide 
when  and  how  to  treat  mental  illnesses,  such  as  depression-  and  soon  will 
join  the  general  health  field  in  justifying  third  party  reimbursement  based 
on  the  number  of  a Diagnostic  Related  Group  (DRG).  "Cost  benefit” 
studies,  utilization  review  processes,  and  performance  mezisurement  sys- 
tems provide  numbers  by  which  the  quality  of  our  work  is  evaluated.  Many 
among  us  who  believe  that  objective  measures  are  key  to  understanding 
the  service  system  applaud  this  movement  toward  decisions  based  on  data. 
But  the  sophisticated  "data  cruncher”  looks  at  this  veritable  explosion  of 
numbers  and  is  forced  to  zisk,  "what  do  the  numbers  really  mean?"  Do  they 
actually  reflect  quality  and  lead  to  more  rational  decisionmaking,  or  are 
they  just  scientific  trappings  used  to  disguise  a mirage  on  which  to  ra- 
tionalize the  same  old  subjective  decisionmaking  process? 

In  this  light,  the  present  volume  is  a large  step  toward  a dispassionate 
2md  critical  examination  of  what  have  been  a set  of  natural  experiments 
in  the  use  of  quantification  for  management  in  the  form  of  program  per- 
formance measurement  systems.  Most  were  developed  and  applied  by  the 
Federal  Government,  but  some  have  been  introduced  by  States.  As  Dr. 
Windle  aptly  states  in  his  preface,  how  to  measure  effectiveness  is  not 
nearly  as  important  as  what  to  measure.  I would  suggest  that  in  euidition 
to  deciding  what  to  mezisure  and  how  to  measure  it,  a most  important 
question  for  managers  and  policymakers  is  "after  I've  mezisured  it,  how 
can  I use  the  measure?"  A few  numbers  obviously  do  not  give  us  a full 
picture  of  a program,  any  more  than  a laboratory  test  gives  us  a complete 
view  of  a patient.  And,  like  test  results,  their  use  must  be  tempered  by 
the  context.  This  book  provides  both  actual  examples  of  the  importance  of 
context  and  suggestions  for  how  to  develop  performance  measurement 
systems. 

A final  note:  it  should  continually  be  kept  in  mind  as  one  looks  at  the 
performance  of  a program  or  a group  of  programs,  that  these  are  not 
islzmds  that  are  isolated  from  the  rest  of  the  service  S5rstem.  A hi^  or  a 
low  value  on  a particular  measure  for  a program  may  be  in  part  (or  en- 
tirely) due  to  the  presence  or  absence  of  a similar  service  elsewhere  in 
the  system,  to  a policy  or  regulation  at  the  State  or  national  level,  or  to 
pressure  from  the  community,  third  party  payers,  or  other  powers  in  the 
program's  environment.  Quantitative  mezisures  are  seductive.  They  imply 
precision  and  objectivity  and  therefore  seem  to  be  a good  basis  for  action. 
In  fact,  they  can  be  extremely  valuable  indicators  of  what  is  happening  in 
the  system.  But  we  must  remember  at  all  times  that  they  are  only  a 
lighthouse  perched  atop  an  iceberg  of  complexity  within  the  system  of 
services  for  the  mentally  ill. 

James  W.  Thompson,  M.D.,  M.P.H. 

Chief 

Service  System  and  Economics  Rese2u*ch  Branch 

Division  of  Biometry  and  Epidemiology,  NIMH 


iii 


Preface 


This  monograph  is  the  resiolt  of  a project  stimulated  by  the  Operations 
Management  System  (OMS)  of  the  Depzirtment  of  Health  and  Hiiman 
Services  (DHHS).  Many  problems  and  difficulties  confront  Izurge  organi- 
zations, such  2is  DHHS,  that  oversee  many  programs  that,  in  turn,  involve 
Izirge  numbers  of  projects  in  local  communities.  States,  or  industries.  Both 
of  the  steps  in  the  oversight  process— central  control  over  the  various 
Federal  programs  zind  each  program's  control  over  local  projects- -pose 
large  technical  and  political  problems. 

Technical  difficulties  in  ztdministering  programs  to  serve  local  needs 
flexibly  stem  from  the  brezidth  of  services  that  are  needed,  the  diversity 
among  projects,  the  diversity  in  environmental  contexts  in  which  projects 
operate,  the  geographic  and  organizational  distance  between  local  project 
sites  and  central  planners  and  administrators,  and  the  logical  difficulty  of 
designing  programs  to  serve  needs  when  the  programs  may  stimulate  as 
well  as  satisfy  needs.  Political  difficulties  result  from  the  many  groups 
with  an  interest  in  the  programs,  the  diversity  of  these  interests  (ranging 
from  direct  and  indirect  recipients  of  service  to  paid  service-providers, 
providers  of  alternative  competing  services,  and  taxpayers);  the  unin- 
tended or  unacknowledged  nature  of  some  program  impacts;  logicad  con- 
flicts among  interests;  linkages  among  interested  peui;ies,  such  as  local 
program  beneficiaries  or  advocates,  executive  branch  program  stad’f,  and 
legislators  (Foley  and  Sharfstein  1983);  and  characteristics  of  organiza- 
tions such  as  the  concentration  of  information  in  the  upper  echelons  that 
shifts  power  to  a limited  number  of  actors  (Michels  1949).  Agencies  at- 
tempt to  administer  these  programs  through  various  management  tech- 
niques, relying  heavily  upon  approaches  that  have  current  standing  in  the 
private  sector  or  public  administration.  In  1979,  Department  Secretary 
Harris  initiated  an  OMS,  employing  program  output  indicators  to  be  used 
by  the  Department  and  by  certain  programs  within  the  Department.  One 
of  these  programs  was  the  Community  Mental  Health  Centers  (CMHC) 
Program  managed  by  the  National  Institute  of  Mental  Health  (NIMH). 

In  NIMH,  there  seemed  to  be  three  reactions  to  the  requirement  for 
performance  measurement.  Some  staff  were  reluctant  to  implement  such 
a system,  believing  that  (1)  important  program  goals  and  clinical  proc- 
esses were  too  complicated  to  be  captured  by  statistical  indices;  (2)  the 
program  was  so  complex  and  required  so  much  flexibility  to  meet  special 
local  conditions  that  program  statistics  were  insufficient  as  a guide  to 
program  management;  zind  (3)  such  statistics,  if  collected  on  the  program, 
might  enable  groups  unsympathetic  to  the  program  to  take  actions  that 
would  harm  CMHCs. 

A second  group  in  NIMH  welcomed  the  requirement  as  an  opportunity  to 
increase  knowledge  about  the  CMHC  Program,  to  improve  NIMH's  in- 
formation collection  process,  and  ultimately  to  improve  program  manage- 
ment by  use  of  better  information. 

A third  set  of  NIMH  staff  felt  that  a performance  measurement  system 
is  so  difficult  to  design  technically  zmd  to  operate  politically  that  much 
preparation  is  required  before  or  accompanying  implementation.  Some 
who  shared  this  latter  view  attempted  to  describe  the  scope  of  con- 
siderations that  should  be  included  in  implementing  a performance  mezis- 


iv 


urement  system  (Keppler-Seid,  Windle,  and  Woy  1980).  The  introduction 
of  this  book  is  a more  general  version  of  our  analysis.  In  brief,  we  believed 
the  implementation  process  needed  to  include: 

• research  to  validate  the  particular  indicators  chosen  as  performance 
meeisures  (NIMH  1981), 

• a studied  and  collaboratively  developed  plan  for  using  the  perfoim- 
eince  meeisures,  and 

• broader  participation  in  the  development  of  indicators  and  in  the 
design  of  the  entire  performance  measurement  system. 

This  conceptualization  of  the  features  required  in  performance  mezis- 
urement  s5rstems  led  to  the  present  book.  It  seemed  clear  that  two  ac- 
tivities—measurement  and  management-  had  to  merge  for  useful  per- 
formance meeisurement  to  exist.  The  possibility  of  and  requirements  for 
performance  mezisurement,  then,  should  emerge  from  considering  both  the 
state  of  the  zirt  of  program  mezisurement  and  the  state  of  the  art  of  pro- 
gram management.  Papers  describing  each  of  these  domains  constitute 
parts  I and  II  of  the  book. 

An  alternate  approach  to  understanding  program  performance  meas- 
urement is  to  examine  past  attempts  to  design  and  apply  such  systems. 
Seven  cases  covering  a wide  range  of  programs  are  presented  in  part  III. 

The  purpose  of  this  book  is  to  describe  the  state  of  the  art  of  program 
performance  mezisurement.  It  is  neither  to  advocate  such  sj^tems  nor  to 
discourage  program  administrators  or  government  regulators  from  at- 
tempting to  develop  them.  Rather,  it  is  to  provide  information  about  such 
^tems  and  experience  with  them  so  that  those  considering  whether  to 
try  to  render  programs  more  rational,  focused,  and  accountable  through 
performance  measurement  can  reach  wise  decisions  and  those  actually 
attempting  to  implement  performance  mezisurement  sj^tems  can  move 
beyond  past  mistakes. 

The  most  fundamental  mistake  in  thinking  about  performance  meas- 
urement is  to  believe  that  problems  in  this  area  are  primarily  technical- 
questions  of  how  to  mezisure  effectiveness  or  productivity.  More  funda- 
mental is  the  question  of  what  to  meeisure.  The  excellent  review  of  the 
literature  on  performance  mezisurement  by  Kanter  and  Brinkerhoff  (1981) 
accepts  a "political"  model  of  organizations,  viewing  organizations  as 
battlegrounds  for  stakeholders  inside  and  outside  the  organization  who 
compete  to  influence  the  criteria  of  effectiveness  to  advance  their  own 
interests.  This  view  suggests  that  "multiple  constituencies  and  multiple 
environments  require  multiple  meeisures"  (Kanter  and  Brinkerhoff  1981,  p. 
344).  Performance  measurement  in  a pluralist  system  can  be  complex. 
This  vision  of  a multi-index,  pluralistically  based  performance  mezisure- 
ment  ssrstem  remains  far  from  what  programs  have  yet  achieved,  perhaps 
because  their  scope  of  participation  is  seldom  broad. 

A related  important  feature  of  performance  measiorement  systems  is 
that  they  zissume  that  programs  do  or  should  operate  rationally,  con- 
structing measurable  goals  and  objectives,  seriously  pursuing  these  goals 
and  using  information  on  goal  achievement  for  guidaiice.  It  is  clear  that 
programs  do  not  operate  in  a strictly  rational,  economic  manner.  Further, 
it  is  doubtful  that  they  should.  Stone  (1980)  suggested  that  the  need  for 
social  programs  to  be  responsive  to  the  variety  of  citizen  needs  and  values 
necessarily  means  that  politics  constantly  permeates  the  implementation 
process  as  well  as  the  initial  goal-setting  process.  Thus,  administrative 
efficiency  may  be  less  important  than  value  trade  offs.  Kelly  (1980) 
pointed  out  that  a simple  economic  efficiency  approach  aims  for  produc- 


V 


tivity  indices  baised  on  output/input  ratios  and  cost-benefit  analyses;  the 
broaider  political-sociological-psychological  perspective,  which  sees  pol- 
itics continuing  to  affect  program  implementation,  aims  for  client  sat- 
isfaction amd  outcome  meeisurement.  The  different  orientations  imply 
different  types  of  performance  meaisures  based  on  different  types  of  data 
and  use  by  different  groups. 

As  the  cost  and  the  rhetoric-accomplishment  gap  of  the  many  public 
service  programs  initiated  during  the  1960s  have  become  visible,  a de- 
mand for  program  accountability  heis  grown.  Program  evaluation  became 
widely  accepted  as  a means  for  accountability  eind  program  improvement. 
Program  performance  measurement  seems  the  most  rational  form  of 
program  evaluation,  and  thus  an  aid  to  rational  program  management. 
Whether  and  when  this  approach  to  management  is  appropriate  is  a matter 
of  debate,  a debate  aggravated  by  awareness  of  lowered  national  U.S. 
productivity  and  concern  with  alternative  systems  of  mamagement  (Ouchi 
1981).  To  design  future  management  and  accountability  systems,  it  will  be 
important  to  observe  the  achievements  of  performance  measurement 
systems  amd  the  conditions  where  this  approach  works  well  or  works 
poorly.  The  present  book  is  a start  in  this  direction. 


Charles  Windle 

Chief,  Service  System  Reseau-ch  Program 
Division  of  Biometry  and  Epidemiology 
National  Institute  of  Mental  Health 


Foley,  H.A.,  amd  Sharfstein,  S.S.  Madness  and  Government.  Who  Cares  for  the 
Mentally  nn  Waishington,  D.C.:  Americain  Psychiatric  Press,  1983. 

Kanter,  R.M.,  auid  Brinkerhoff,  D.  Organizationad  performaince:  Recent  develop- 
ments in  measurement.  Annual  Review  of  Sociology  7:321-349,  1981. 

Kelly,  R.M.  Ideology,  effectiveness,  amd  public  sector  productivity:  With  illus- 
trations from  the  field  of  higher  ed\ication.  Journal  of  Social  Issues  36:76-95, 
1980. 

Keppler-Seid,  H.;  Windle,  C.;  amd  Woy,  J.R.  Performance  meaisures  for  mentad 
headth  programs:  Something  better,  something  worse  or  more  of  the  same? 
Community  Mental  Health  Joumcd  16:217-234,  1980. 

Michels,  R.  Political  Parties.  Translated  by  E.  amd  C.  Paul.  Glencoe,  111.:  Free 
Press,  1949.  (First  published  in  1915.) 

National  Institute  of  Mental  Health.  "A  guidebook  to  the  1981  Operations  Man- 
agement System  for  federally  funded  community  mental  headth  centers." 
Rockville,  Md.:  the  Institute,  April  1981. 

Ouchi,  W.  Theory  Z:  How  American  Business  Can  Meet  the  Japanese  Challenge. 
Reading,  Maiss.:  Addison-Wesley,  1981. 

Stone,  C.N.  The  implementation  of  social  prograuns:  Two  perspectives.  Journal  of 
Social  Issues  36:13-34,  1980. 


vi 


Contents 


page 

Foreword iii 

Preface iv 

Introduction:  A Model  for  Implementing  Performance  Measurement 

Charles  Windle  and  Heather  Keppler-Seid 1 

Part  I:  The  State  of  the  Art  of  Measurement 15 

Human  Service  Information  Systems 

C.  Clifford  Attkisson  and  Anthony  Broskowski  ....  20 

Measuring  the  Quality  of  Medical  Care:  Process 
Versus  Outcome 

William  E.  McAuliffe 25 

The  Status  of  Productivity  Meeisurement  in  the  Public 
Sector-  Ein  Update 

Harry  P.  Hatry 37 

Part  II:  The  State  of  the  Art  of  Management 51 

Data-Based  Monitoring 

Homer  J.  Hagedom 57 

Dysfunctional  Potentials  of  Performance  Measurement 
Pauline  E.  Ginsberg 67 

Part  ni:  Case  Studies  in  Performance  Measurement 73 

Social  Performance  in  an  Engineering  Framework: 

The  LEAA  Experience 

Edwin  W.  Zedlewski 76 

Performance  Indicators  in  Housing 
Vicki  Elmer 84 

Developing  Performance  Standards  and  Measures 
in  Vocational  Rehabilitation 

Susan  Stoddard 94 

State  Mental  Health  Program  Performance  Mezisurement: 

Selected  Impressions  From  Three  States 
Wayne  A.  Kimmel 108 


vii 


page 


Unintended  Effects  of  Program  Performance 
Measurement  in  Pennsylvania 

Pauline  E.  Ginsburg 117 

A Case  Summary:  The  Department  of  Health, 

Education,  and  Welfzu'e's  Development  and 

Use  of  Certification  Criteria 125 

A Case  Summary:  Performance  Information  in  the 
Atlanta  School  System 

Bayla  F.  White 127 

Concluding  Remarks:  The  Performance  of  Performance  Measurement  132 


vin 


Introduction 

A Model  for  Implementing  Performance  Measurement  * 

Charles  Windle.  Ph.D.,  and  Heather  Keppler-Seid 


The  American  public  increasingly  demon- 
strates dissatisfaction  with  and  diminished 
faith  in  the  Federal  Government  and  its  pro- 
grams (Institute  for  Social  Research  1979; 
Yankelovitch  and  Kaagan  1979;  Union  Carbide 
Corporation  1980).  This  growing  dissatis- 
faction expresses  itself  in  a demand  for  evi- 
dence of  efficiency  and  effectiveness  in 
Government-supported  programs.  The  Govern- 
ment, zis  a result,  feels  increaised  need  to  ob- 
tain information  from  the  programs  it  supports 
to  document  program  success  (and  thereby 
justify  continued  funding)  and  spur  greater 
program  accomplishments.  Further,  as  Schick 
(1971)  pointed  out,  focus  shifts  from  program 
anzdysis  when  resources  are  available  for  new 
programs,  to  program  evaluation  when  re- 
sources are  scarce,  as  in  these  inflationary 
tax-cutting  times. 

Program  evaluation.  Government's  answer 
to  distrust  and  accusations  of  weiste,  can  be 
defined  in  a variety  of  ways  (Glziss  and  Ellett 
1980),  can  use  a variety  of  methods  and  cri- 
teria (Leuidsberg  et  al.  1979),  can  focus  on  the 
interests  of  diverse  program  stakeholders 
(Eyman  2ind  Windle  1976;  Windle  and  Woy 

1977) ,  and  can  be  directed  toward  several 
piirposes  (Windle  and  Neigher  1978).  Program 
evaluation  has  been  widely  recognized  as  in- 
creasing in  popularity  through  the  1970s  (e.g., 
Fleiherty  and  Morell  1978;  Wertheimer  et  al. 

1978) .  Program  evaluation's  fate  in  the  1980s 
was  less  clear,  zis  the  Federal  social  programs 
to  which  it  was  applied  were  cut  and  assigned 
to  the  States  in  block  grants  (Neigher  et  al. 
1982).  Windle  (1979)  suggested  that  the  pur- 
pose of  program  evaluation  is  evolving  from 
self-improvement  to  accountability,  but  such 
a trend  may  take  long  to  come  to  fruition. 


*This  chapter  expands  on  a more  limited  version  of 
the  material  directed  toward  mental  health  pro- 
grams (Keppler-Seid  et  al.  1980). 


Accountability  systems  are  based  on  a num- 
ber of  zissumptions  that  may  not  be  adequately 
recognized.  Accountability  assumes  a clearly 
hierarchical  relationship  between  two  or  more 
groups.  One  group  is  defined  zis  accountable  to 
the  other,  that  is,  it  must  report  information 
that  the  other  group  can  use,  presumably  to  be 
able  to  act  more  wisely  in  dealing  with  the 
accountable  group. 

Accountability  also  implies  routine,  pre- 
dictable, and  perhaps  even  quantitatively 
oriented  programs,  or  a focus  upon  such 
characteristics  within  programs.  Similarly, 
there  is  likely  to  be  more  emphasis  on  struc- 
tural or  administrative  eispects  of  the  pro- 
gram, such  8US  cost  and  proper  use  of  staff 
time,  than  on  program  outcomes  or  benefits 
(Alger  1980). 

A number  of  these  features  of  accounta- 
bility systems  seem  incompatible  with  pro- 
grams that  lack  agreed-upon  technologies  and 
cire  operated  by  professionals  accustomed  to 
autonomy.  When  techniques  are  so  complex 
that  they  cannot  be  described  clearly  enough 
for  relatively  imtrained  persons  to  use  them, 
or  the  conditions  necessziry  for  successful  ai>- 
plication  are  altogether  unknown,  effective 
practice  is  regarded  as  an  art  rather  than  a 
science.  It  is  to  the  advantage  of  practitioners 
to  have  their  skills  so  regarded,  for  then  the 
practitioners,  lacking  competition  or  critical 
review,  can  demand  high  prices  and  prestige. 
Ingram  (1979)  hzis  described  how  the  role  def- 
inition of  university  professors  prevalent  in 
academia  makes  both  professors  eind  students 
devalue  student  evalviations  of  professors  and 
oppose  standardized  evaluation  procedures. 
Similar  incompatibilities  exist  between  the 
role  definitions  of  most  professionals  and 
standardized  accountability  s5rstems. 

It  is  worth  observing  that  some  opposition  to 
accountability  comes  not  only  from  the  elites 
whose  autonomy  would  be  limited,  but  also 
from  the  general  public  and  relatively  power- 


1 


Figure  1.  A model  for  implementing  a service  program  performance  measurement  system 


M 

(A 


0> 

■o 

o 

S 

o 


e 

M 

C 

o 

E 

o 

iu 


E 

(0 

hm 

O g* 

i2  a 

si 

•!l 

H E 
o 
c 


^ 00 
— .2 
!o  5 

~ E 
o o 

k.  W 

•2  E 

«C  0> 
C/>  Q 
IS  b. 

(0  -Q 
-k-  Z 

E 

Uj  ro 


r — ^ 


1 ? 
SIS 

? ° i 
^11 


CO 

c 

S 

Q. 


(O  • 
>% 
(/> 


t 


Q. 

O 

$ „ I 

Sis 

•O  -s  «« 
= 02 


(0  j; 
c 
2 
0. 


t 


O) 

'w 

® 

T3 

•E  s 

M 5 

II 

O CL 

® 

> 

o 

> 

c 


t 


2 


Changes  in  societal  _ Changes  in  service 

conditions  and  needs  ^ programs 


less  groups  of  consumers  whose  interests  ac- 
countability is  alleged  to  serve.  Such  opposi- 
tion may  spring  from  what  has  been  called 
"clientelism,”  personal  ties  between  those  of 
higher  and  lower  status  (Schmidt  et  al.  1977), 
and  from  acceptance  of  an  inequitable  status 
quo  that  is  rationalized  by  a political  language 
(Edelman  1977). 

It  should  be  noted  that  the  use  of  perform- 
ance measures  represents  only  one  of  five 
types  of  accountability,  as  identified  by 
Bjorkman  zind  Altenstetter  (1979):  political 
(submission  of  political  leaders  to  elections), 
bureaucratic  (subordinates  answering  to  su- 
periors within  organizations),  professional 
(peers  exercising  accreditation  and  recognition 
on  the  basis  of  skill  and  control  of  informa- 
tion), economic  (market  relationship  of  con- 
sumers to  providers  of  products),  and  legal 
(judicial  procedures  by  citizens  against 
bureaucrats  for  nonperformance  of  political 
statutes).  As  Bjorkman  and  Altenstetter  (1979) 
argue  for  health  czire,  ”.  . . all  types  of  ac- 
countability must  be  intermeshed  and  made  to 
work  better"  (p.  378).  Performance  measure- 
ment is  a bureaucratic  form  of  accountability, 
but  it  may  require  the  support  of  other  t5rpes 
of  accountability  in  order  to  operate. 


An  Implementation  Model 

Some  critics  of  social  programs  have  iden- 
tified the  implementation  phase  of  programs 
as  a critical  problem,  often  leading  to  frus- 
tration of  the  hopes  that  initiated  the  pro- 
grams (Williams  and  Elmore  1976).  A per- 
formance measurement  accountability  ss^tem 
seems  extremely  vulnerable  to  problems  in 
implementation.  These  vulnerabilities  may 
best  be  seen  by  analyzing  the  elements  in- 
volved in  implementing  such  a system.  We 
propose  a four-element,  four-functional-level 
model  (see  figure  1).  The  four  elements  are: 

1.  Goals  or  dimensions  on  which  to  judge  a 
program’s  accomplishments 

2.  Indicators  and  specific  ways  to  mezisure 
performance  with  these  indicators 

3.  Data  generation  or  measurement  system 

4.  Oversight  management  sjrstem  to  ensure 
that  performance  measures  are  used 
appropriately,  effectively,  and  efficiently 

In  addition  to  the  four  elements,  the  model 


for  implementation  includes  four  levels  of 
functioning:  research,  design,  support-gen- 
eration, and  operation.  Research  is  needed  to 
validate  performance  measures,  assess  reli- 
ability of  data  sources,  and  evaluate  the  uses 
and  impacts  of  the  measurements  on  the 
services  being  provided.  It  is  less  clear  that 
research  on  the  goal-setting  process  is  as  ap- 
propriate. Helping  program  managers  express 
or  formulate  their  program's  goals  may  in- 
volve more  consultation  than  research,  and 
getting  the  public  to  express  and  formulate  its 
values  regarding  a program  may  require  more 
advocacy  or  agitation  than  research.  However, 
limited  researchlike  activities  can  assist  the 
process,  such  as  surveys  of  opinions  of  pro- 
gram staff  or  the  general  public,  comparison 
of  the  logic  and  assumptions  of  the  program 
with  findings  in  the  literature,  and  evaluability 
assessments  (Wholey  1979). 

The  design  and  support-generation  levels 
could  be  combined  in  a broad  planning  level, 
but  we  have  chosen  to  separate  them  to 
highlight  the  need  for  attitudinal  preparation, 
a need  that  is  vulnerable  to  being  subordinated 
to  content  aspects  of  the  design.  The  design 
level  involves  planning  the  contents  of  a per- 
formance system.  Support -generation  includes 
the  dissemination  of  information  about  the 
purpose  and  characteristics  of  the  planned 
performance  measurement  sjrstem  and  efforts 
to  win  acceptance  by  all  those  who  must  con- 
tribute information  for  or  use  the  results  of 
the  system.  Acceptance  is  more  likely  if  there 
are  obvious  opportunities  for  participation  and 
a large  degree  of  such  participation  by  a wide 
variety  of  those  interested  in  the  programs 
whose  performance  is  to  be  meeisured,  under 
conditions  where  participants  feel  theii-  views 
receive  favorable  consideration. 

The  operations  level  represents  a working 
performance  measurement  system.  Ideally,  it 
would  include  a feedback  loop  that  changes 
the  performance  measurement  system  as  the 
program  and  societal  needs  for  the  program 
change. 

Each  step  of  the  model  will  be  discussed 
here.  To  show  how  the  model  would  work  in 
practice,  each  step  is  illiistrated  with  an  ex- 
ample from  the  goal  of  equity  of  community 
mental  health  services  to  ethnic  minorities. 


Specification  of  Program  Goals 
and  Values 

The  first  element  of  the  model  has  two 
components:  identification  of  program  goals 


3 


and  objectives  and  specification  of  the  values 
of  other  actors  in  the  service  system.  The 
combination  of  this  element  of  the  model  with 
the  next  element,  performance  indicator 
specification,  is  much  the  same  as  what 
Wholey  and  others  have  called  evaluability 
assessment  or  exploratory  evaluations  (Wholey 
et  al.  1975;  Wholey  1979).  They  believe  that 
this  process  is  fundamental  for  planning 
evaluations  relevant  to  managers*  concerns 
and  for  siscertaining  whether  a meaningful 
evaluation  is  possible. 

Wholey  (1979)  describes  evaluability  as- 
sessment as  a multistep  process  in  which  the 
evaluator  first  defines  the  program  to  be 
evaluated  and  collects  information  on  it  in 
order  to  develop  a concise  description  of  the 
program  and  its  expected  results.  Using  this 
description,  the  evaluator  can  document  the 
extent  to  which  program  activities  and  ob- 
jectives are  defined  in  measurable  terms.  The 
evaluator  then  collects  the  necessziry  infor- 
mation 2ind  clarifies  which  performance 
measurements  are  plausible  and  feasible,  given 
the  activities  underway.  Finally,  he  or  she 
identifies  evaluator  and  management  options 
2ind  presents  the  results  to  management  for  its 
use.  At  this  point,  one  may  conclude  that  a 
program  is  not  evaluable  because  it  lacks 
clear,  consistent,  measurable  goals.  Such  a 
judgment  can  save  resources  from  being  ex- 
pended fruitlessly  on  meaisurement.  On  the 
other  hand,  assessment  of  lack  of  evalu- 
ability may  prompt  a program  to  be  more 
specific  in  its  goals  and  objectives,  allowing 
later  meaningful  mezisurement  criteria. 

This  approach  derives  values  largely,  if  not 
exclusively,  from  program  goals  and,  there- 
fore, slights  implicit  values  and  the  views  of 
persons  without  commitment  to  the  particular 
program  but  only  to  objectives  the  program 
could  achieve.  We  believe  Wholey's  procedure 
can  be  improved  by  broadening  the  description 
of  the  program  goals  to  incorporate  (1) 
changes  in  the  importance  of  particular  pro- 
gram achievements  from  the  time  initial  goals 
are  set  and  the  time  until  program  results  oc- 
cur, (2)  the  personal  values  of  those  who  op- 
erate the  program  (since  these  values  usually 
influence  program  actions  strongly  but  are 
seldom  explicitly  recognized),  and  (3)  the 
values  of  other  relevant  community  groups, 
such  as  agencies  whose  cooperation  is  needed, 
clients,  and  the  program's  competitors  eind 
enemies. 

The  specification  of  program  goals  and 
values,  then,  should  be  brozidened  to  include  a 
clear  understanding  of  the  social  values  a 


given  service  program  should  accomplish,  in- 
cluding the  compromises  necessary  between 
competing  values.  This  explication  of  the 
theoretical  zissumptions  and  values  that  guide 
interventions  is  likely  to  be  very  difficult  for 
technologies  that  are  more  art  than  science, 
or  where  consensus  is  lacking  among  profes- 
sionals about  theory  or  procedure  (Pottinger 
1979).  To  be  realistic,  such  a conceptuali- 
zation might  be  done  in  terms  of  the  public 
interests  to  be  served  and  then  modified  to 
reflect  the  unavoidable  compromises  needed 
to  meet  political,  governmental,  and  staff 
interests. 

Validation  of  this  element  will  be  political 
rather  than  scientific.  The  underlying  values 
will  perhaps  change  as  the  new  accountability 
S3^tem  changes  existing  services,  which  in 
turn  change  the  pattern  of  needs,  demands, 
and  client  groups.  These  changes  probably  will 
be  expressed  through  voting,  citizen  partici- 
pation on  boards,  attacks  or  support  by  in- 
terest groups,  staff  turnover  and  dissatis- 
faction, and  so  on. 

An  example  of  specifying  goals  and  values 
may  be  taken  from  the  Community  Mental 
Health  Centers  (CMHC)  Program.  A primary 
goal  of  the  CMHC  Program  is  responsiveness 
to  community  needs.  Since  ethnic  minorities  in 
the  United  States  are  more  subject  to  dis- 
crimination and  poverty  than  the  white  ma- 
jority, they  are  likely  to  have  more  unmet 
needs  for  public  mental  health  services  than 
whites.  Thus,  an  objective  against  which  the 
performance  of  a CMHC  might  be  measured  is 
the  relative  extent  to  which  it  provides  mental 
health  services  to  ethnic  minorities. 

Specification  of  Performance  Indicators 

After  a program’s  goals  have  been  specified, 
these  goals  must  be  operationalized  and 
meaningful  indicators  chosen.  Performance 
indicators  of  service  processes  aind  outcomes 
that  reflect  the  purposes,  costs,  and  other 
likely  associated  program  consequences  prob- 
ably will  be  necessary  at  two  levels.  Generic 
indicators  will  be  common  to  most  services 
and  will  be  measured  in  the  same  way  in  most 
programs,  whereas  specific  indicators  will 
apply  only  to  certain  programs.  Indicators  also 
v^l  be  needed  to  measure  both  amounts  and 
impacts  of  activities  aimed  at  systems  change. 
Organizations  providing  only  direct  services  to 
clients,  of  course,  will  use  only  service 
indicators. 

To  ensure  meaningfulness  of  the  data,  the 
task  of  choosing  indicators  m\jst  include 


4 


identification  of  reciprocal  indicators,  i.e., 
statistics  that  check  on  factors  which,  if  un- 
controlled, might  undermine  the  meaningful- 
ness of  changes  on  a primary  indicator  (Fon- 
tane  1975).  For  example,  mezisures  of  numbers 
of  clients  served  should  be  accompanied  by 
evidence  that  there  is  not  simultaneously 
variation  in  the  types  of  clients  served  or 
types  of  services  given.  The  Urban  Institute 
heis  provided  a framework  for  identifying  tjrpes 
of  policy-relevant  information  and  perform- 
ance indicators  relating  both  to  institutions 
and  to  society  (Gam  et  al.  1976), 

The  World  Health  Organization  (1971)  has 
specified  criteria  for  the  constmction  of  in- 
dices of  public  health  that  seem  appropriate 
also  for  performance  mezisurement  in  most 
programs.  Some  of  those  suggested  include: 
completeness  of  coverage,  ease  and  inexpen- 
siveness of  indices  to  be  calculated,  wide  ac- 
ceptance of  indices,  reproducibility,  specific- 
ity, sensitivity,  and  validity  of  indices,  and 
availability  and  other  characteristics  of  the 
data  to  be  used  in  the  indices  (see  next 
section). 

Performance  measurement  indicators  can  be 
chosen  in  several  wa3^.  Probably  the  most 
usual  approach  is  for  the  managers  of  each 
program  to  have  their  staff  do  this  task  for 
the  program.  This  approach  has  the  advantage 
of  relevance  to  what  is  actually  going  on  in 
the  program  amd  of  acceptability  to  the  man- 
agers as  a bzisis  for  action.  The  more  inti- 
mately the  managers  themselves  are  involved, 
the  more  committed  they  can  be  expected  to 
be.  The  importance  of  commitment  by  par- 
ticular managers,  however,  is  less  for  pro- 
grams with  high  rather  than  low  turnover  of 
managers. 

Another  way  to  choose  indicators  is  to  ob- 
tain the  consensus  of  experts  through  work- 
shops, discussions,  or  such  formal  procedures 
as  the  nominal  group  technique  (Delbecq  et  al. 
1975;  Becker  1979).  Since  the  outside  experts 
will  often  have  greater  experience  with  a wide 
variety  of  programs  and  greater  interest  in 
theory  than  program  staff,  this  expert-based 
approach  is  likely  to  lead  to  mezisures  with 
greater  interprogram  utility  and  relevance  to 
theory. 

A third  way  to  obtain  indicators  is  to  use 
this  tzisk  as  an  opportunity  for  educating  and 
developing  commitment  from  the  host  of  units 
that  must  provide  data  for  performance 
measwes,  organizations  representing  con- 
sumer interest  in  the  program  and  industries 
and  organizations  speaking  for  the  care- 
providing sector.  These  segments  can  best 


participate  in  choosing  indicators  by  reacting 
to  suggestions  developed  by  others.  In  this 
reactive  process,  they  may  suggest  additional 
indicators  that  are  feasible,  or  provide  evi- 
dence that  some  indicators  others  have  pro- 
posed will  not  be  fezisible  technically  or  be- 
cause of  reactions  by  the  public  or  clinicians. 

In  practice,  all  of  these  approaches  should 
be  used  in  part,  in  order  to  obtain  the  special 
benefits  each  provides.  In  addition,  these  ap- 
proaches probably  will  be  supplemented  by  a 
strong  practical  bias  toward  data  that  are 
currently  or  could  economically  be  made 
available  routinely,  as  well  2is  types  of  indi- 
cators expected  to  yield  information  that 
makes  the  program  appear  effective.  For  ex- 
ample, indices  generating  large  numbers  (such 
as  patient  contacts  during  a time  period)  quite 
often  are  preferred  for  competitive  public 
relations  purposes  over  those  producing  small 
numbers  (such  as  episodes  of  patient  care). 

A commonsense  view  of  performance  meas- 
ures suggests  that  the  ideal  measure  would 
deal  with  outcome  or  impact  from  the  service, 
e.g.,  change  to  clients,  to  communities,  or  to 
service  systems,  relative  to  cost,  since  these 
are  the  bottom  lines  for  the  program.  On  the 
other  hand,  this  summative  information  does 
not  tell  how  a program  might  be  changed  to 
improve  and  is  often  very  difficult  or  costly  to 
obtain. 

Bernstein  and  Freeman  (1975)  have  argued 
that  studies  of  adequate  quality  should  include 
measures  both  of  service  processes  and  of 
outcome.  Rossi  (1978)  has  suggested  three 
different  types  of  evaluation:  one  to  establish 
the  scientific  concepts  used  to  design  pro- 
grams, a second  to  field  test  the  feasibility  of 
a program  based  on  these  concepts,  and  the 
third  to  monitor  whether  the  program  operates 
appropriately  in  day-to-day  practice.  Rossi’s 
first  stage  of  evaluating  program  concepts 
calls  for  relating  service  processes  to  out- 
come. This  type  of  study  requires  experi- 
mental control  of  variables  and  is  likely  to  be 
costly.  Once  done  and  replicated  for  gener- 
ality, however,  evaluations  to  validate  pro- 
gram concepts  would  not  need  to  be  repeated 
in  all  facilities.  This  type  of  generalizable 
research  should  be  funded  by  government,  not 
local  service  agencies.  Once  programs  are 
validated,  the  other  two  forms  of  evaluation 
can  simply  use  the  validated  process  measures 
as  the  dependent  variable.  For  this  logical 
reason,  it  seems  that  performance  measure- 
ment systems  serving  accountability  purposes 
should  focus  on  service  process  measures  that 
have  known  correlates  to  outcome. 


5 


Additional  reeisons  also  zwgue  for  placing 
emphasis  on  process  measures.  For  many 
ser^ces,  the  linkage  with  effects  on  clients 
remains  so  tenuous  that  it  is  not  fair  or  rea- 
sonable to  base  am  accountability  system  on 
outcomes  (Greer  et  al.  1978).  In  zuidition,  the 
types  of  outcomes  appropriate  for  clients, 
their  families,  and  communities  are  too  nu- 
merous to  be  measTired  feeisibly  for  most  types 
of  services.  The  connection  between  inter- 
ventions and  outcomes  is  obscured  by  the  im- 
mense variation  in  the  clients'  conditions  and 
environments,  in  who  makes  the  interventions, 
and  in  client-intervenors  interactions  (Parloff 
1979).  Long  periods  of  time  may  be  necessary 
for  outcomes  to  be  manifest,  and  often  the 
degree  of  success  is  not  precisely  known. 

McAuliffe  (1979)  has  presented  compelling 
arguments  that  "Contrary  to  current  practice, 
outcome  measures  must  be  empirically  vali- 
dated just  as  process  measures  must,  for  out- 
come measures  of  quality  [of  care]  are  not 
obviously  valid"  (p.  124).  \^en  "end  results 
[such  as  client  satisfaction]  have  determinants 
in  addition  to  [the  services  in  which  we  are 
interested]  the  variance  associated  with  these 
other  factors  is  sjrstematic  measurement 
error,  a type  of  invalidity"  (p.  123). 

Yet  another  “eeison  for  preferring  program 
process  measures  over  outcome  measures  is 
that  a system  that  rewzirds  good  outcome 
might  distort  programs  by  rewewding  the  se- 
lection of  clients  for  whom  a good  outcome  is 
expected.  Rewarding  good  outcome  without 
control  for  severity  of  problems  would  in- 
cre2ise  existing  incentives  to  serve  clients  with 
minor  problems  and  favorable  prognoses,  «md 
to  avoid  the  underserved  clients  whose  lives 
zu*e  replete  with  problems:  the  poor,  old,  un- 
educated, unemployed,  chronically  ill,  and 
ethnic  minorities. 

Both  process  and  outcome  indicators,  of 
course,  can  be  derived  for  the  specific  pro- 
gram objective  used  as  an  example  in  the  first 
element  of  the  model,  i.e.,  incresising  equity 
of  services  to  minorities.  An  indicator  of 
service  process  might  be  the  relative  nonwhite 
utilization  rate  (admission  rate  per  thousand 
service  area  residents  for  nonwhites  as  a per- 
centage of  the  admission  rate  per  thousand  for 
whites),  with  100  percent  meaning  whites  eind 
nonwhites  are  served  at  the  same  rate  (Windle 
et  al.  1979).  Scores  less  than  100  percent 
would  indicate  proportionately  less  service  to 
minorities  than  to  whites  and  might  be  con- 
sidered inappropriate,  unless  special  conditions 
exist  (such  as  other  agencies  in  the  area  pro- 
viding specialized  services  to  ethnic  minori- 


ties). Thus,  use  of  this  statistic  requires  rec- 
ognition of  its  potential  inapplicability  (e.g., 
when  there  are  few  minorities  or  if  the  mi- 
norities in  the  service  area  are  affluent). 

An  outcome  meaisure  of  equity  of  service 
might  be  the  similarity  in  improvement  rates 
for  nonwhite  and  white  patients.  Measuring 
differences  in  types  of  treatments  given  to 
white  and  nonwhite  clients  might  serve  as  a 
reciprocal  indicator— to  determine,  for  exam- 
ple, if  nonwhites  are  being  shunted  into  group 
therapies,  while  whites  are  being  directed  to 
individual  therapies,  thus  increasing  the  num- 
ber of  nonwhites  served,  while  providing  dif- 
ferent service  to  whites  and  nonWhites. 

Since  service  processes  are  ezisier  to  meas- 
ure than  treatment  outcomes  and  will  probably 
be  used  in  lieu  of  treatment  outcomes  in  a 
performance  measiirement  system,  research 
on  the  relationship  between  processes  and 
outcomes  is  critical.  The  assumption  that,  if 
appropriate  treatment  processes  are  applied, 
the  desired  treatment  outcomes  will  be 
reached  is  tenable  only  if  a strong  conceptual 
and  empirical  connection  exists  between 
treatment  processes  and  outcomes.  In  the  final 
analysis,  the  credibility  of  amy  S3^tem  of  per- 
formance measurement  based  on  process 
measures  will  rest  on  the  empirical  linkages 
between  treatment  processes  and  discernible 
benefits  to  clients  and/or  the  public.  Because 
different  outcomes  will  be  meamingful  to 
different  users  of  the  information,  research  to 
validate  performance  indicators  will  need  to 
be  done  for  a variety  of  types  of  clients  and 
against  a variety  of  dimensions  of  treatment 
outcome. 

There  are  limits  to  the  types  of  information 
that  are  appropriate  for  use  as  performance 
indicators.  Client  satisfaction,  for  instance, 
should  be  included  in  a performance  measvire- 
ment  system  only  under  certain  conditions.  As 
usually  eidministered  by  staff  of  the  agency 
being  evaluated,  measures  of  client  satis- 
faction tend  to  have  little  discriminative 
value,  because  clients  almost  universally  re- 
port high  levels  of  satisfaction,  and  what 
clients  mean  when  they  indicate  satisfaction 
remains  unclear  (Gutek  1978).  Such  meaisures 
provide  little  useful  information  and  are 
rezidily  misused  for  promotional  purposes.  Only 
if  client  satisfaction  assessment  is  done  under 
clearly  neutral  or  client-advocate  auspices 
(Windle  2ind  Paschall  1981),  and  the  conditions 
of  eidministration  and  sampling  are  clearly 
defined,  might  such  a measure  be  a reeisonable 
peirt  of  a performance  measurement  system. 

Our  model  as  portrayed  in  figure  1 appears 


6 


to  be  structured  according  to  a sequence  of 
piuposive  logic  by  managers,  making  the  first 
two  elements  prerequisites  for  the  latter  two 
elements.  In  practice,  however,  the  latter  two 
elements— the  data  system  zmd  a system  for 
use  of  measures- -are  in  part  simultaneous  and 
interactive  with  goal  and  indicator  specifi- 
cation. The  indicators  ultimately  chosen  will 
depend  upon  what  data  can  be  generated  and 
whether  statistics  beised  on  these  indicators 
are  feasible  to  use.  For  example,  although  an 
index  may  in  fact  prove  to  be  an  accurate 
measure  of  a phenomenon,  the  index  may  lack 
face  validity,  give  an  appearance  of  uuif air- 
ness, or  be  too  complex  to  be  understood 
e2isily.  In  such  cases,  considerations  relating  to 
practical  use  of  the  measure  would  override 
its  technical  adequacy. 

Establishment  of  Reliable  Sources  of  Data 

An  essential  requirement  for  monitoring 
programs  through  performzmce  measures  is  the 
availability  of  ongoing  systems  for  collecting, 
processing,  2ind  analyzing  program  data  that 
meet  predefined  criteria  of  uniformity  and 
comparability. 

Government- supported  programs  that  con- 
sist of  many  local  agencies  scattered  about 
the  country  involve  both  lateral  and  vertical 
variations  in  data  systems.  Local  units  are 
likely  to  differ  widely  in  the  amount  and  type 
of  detail  in  information  ssrstems  and  in  the 
definitions  used.  Much  of  this  variation  is 
likely  to  be  imposed  on  them  by  States  pro- 
viding support  or  exercising  relation.  Ver- 
tically, too,  different  echelons  must  answer 
different  ts^es  of  questions.  Sometimes  this 
vertical  variation  can  follow  a pjn*amidal 
pattern  of  decreasing  detail  with  ascending 
echelons.  Decisionmakers  at  high  echelons 
often  are  satisfied  with  aggregate  data  for  the 
program  as  a whole  over  long  periods  of  time, 
based  on  sampling.  Local  agencies  need  in- 
formation for  subunits,  individual  clients,  and 
staff  members  for  short  time  periods.  Para- 
doxically, higher  echelons  often  are  able  to 
get  so  much  information  on  so  many  local 
units  that  they  can  find  mzmy  reliable  dif- 
ferences between  program  options,  while  local 
agencies  lack  enough  information  on  their 
units  to  make  even  crude  judgments  between 
their  options. 

As  the  role  of  government  in  society  has 
expanded,  increzises  have  occurred  both  in  the 
number  of  local  agencies  that  must  provide 
data  to  the  government  and  in  the  number  of 
government  agencies  and  programs  that  re- 


quire statistical  data  to  support  policy  deci- 
sions (Bonner  et  al.  1980b).  This  hats  created 
pressiares  for  centralization  of  statistical 
systems  at  the  Federal  level  and  for  congru^ 
ence  of  systems  between  Federal  and  State 
levels.  The  leaidership  in  programs  coming 
from  the  Federal  Government  is  accompanied 
by  an  opportunity  to  reduce  incompatibilities 
among  information  systems,  which  may  in  turn 
reduce  redundancy  in  reporting  efforts  by 
local  agencies. 

Several  actions  might  be  taken  to  increase 
the  integrity,  reliability,  relevance,  and  fea- 
sibility of  future  data  collection  systems 
(Bonner  et  al.  1980a).  The  operational  char- 
acteristics of  specific  performance  meaisures 
need  to  be  established  to  determine  what 
program  conditions,  clientele  characteristics, 
and  reporting  requirements  cause  measures  to 
vary.  Standardization  of  management  infor- 
mation system  structure  and  use,  however,  can 
freeze  conceptualizations  and  assumptions, 
thus  stifling  innovation  and  limiting  appro- 
priateness of  service.  Comparability  shovild  be 
sought  in  a way  that  permits  interprogram 
variation  in  treatment  zissumptions  and 
approaches. 

Another  major  problem  in  many  information 
systems  is  the  lack  of  two-way  flow  of  in- 
formation, influence,  and  products  between 
the  data  suppliers  in  local  units  and  the  data 
analyzers  in  central  governments.  If  those  who 
supply  the  information  do  not  receive  rapid 
feedback  of  the  processed  information,  in  a 
format  that  shows  the  information  makes 
sense  and  hzis  been  enriched  by  the  centralized 
processing  (e.g.,  supplemented  by  comparative 
or  trend  data  to  show  relative  positions),  the 
local  agency  staff  will  lose  motivation  to 
provide  accurate  information  promptly.  Es- 
pecially when  information  s3rstems  are  newly 
established,  and  not  all  local  units  have  de- 
veloped procedures  to  report  meaningful  in- 
formation promptly,  unforeseen  difficulties 
are  likely  to  prevent  the  central  unit  from 
providing  prompt,  meaningful  feedback.  This- 
failure  discourages  those  local  units  that  took 
the  request  for  data  seriously.  By  the  time  the 
information  system  acquires  the  ability  to 
feed  information  back  promptly,  local  units 
may  have  lost  interest  in  ensuring  that  data 
are  accurate.  When  local  units  are  at  different 
developmental  stages,  and  the  central  proc- 
essing unit  also  must  develop,  it  is  difficult  to 
maintain  the  cooperative  and  optimistic 
attitudes  at  the  local  units  necessary  to  ensure 
accurate  reporting. 

This  motivational  problem  is  increased  by 


7 


lack  of  knowledge  of  what  types  of  feedback 
will  be  valued  at  local  units.  For  efficiency, 
feedback  is  likely  to  be  in  a imiform  format  to 
all  units,  yet  units  and  individuals  will  have 
idiosyncratic  preferences.  Many,  of  course, 
will  not  see  much  use  in  even  comparative 
data  or  will  need  zissistance  in  its  use.  Some 
managers  will  want  strong  evidence  of  the 
accuracy  of  the  information  or  of  the  action 
implications  of  the  information  before  they 
pay  serious  attention  to  it.  For  example,  an 
index  may  show  that  a particular  CMHC  dif- 
fers from  most  CMHCs  in  serving  a lower 
proportion  of  ethnic  minorities  than  whites  in 
the  service  area.  However,  if  centers  differ  in 
how  completely  they  report  (e.g.,  excluding 
emergencies  and  inpatients),  in  their  ap- 
proaches to  reaching  various  population  sub- 
groups (e.g.,  serving  drug  abusers  anonjmiously 
or  through  other  agencies),  in  financial  status, 
or  in  what  other  service  agencies  are  used  in 
their  communities,  the  meaning  of  the  com- 
parative data  will  be  unclear. 

Some  managers  may  be  put  off  by  the 
quantitative  nature  of  information,  some  by 
the  lack  of  particular  indices  or  controls  they 
feel  are  important,  and  some  by  the  large 
amounts  of  information  that  may  be  fed  back 
in  an  attempt  by  the  data  processing  unit  to 
enable  local  users  to  have  maximum  flexibility 
or  choice.  The  choice  may  be  so  overwhelming 
that  local  users  simply  discard  the  entire 
package.  To  help  CMHCs  plan  and  evaluate 
their  programs,  MMH  provided  them  with  de- 
scriptive and  comparative  data  on  a large 
number  of  indices  for  both  the  catchment  area 
population  and  the  CMHC  services  (Rosen  et 
al.  1975;  NIMH  undated).  The  volume  of  in- 
dices in  these  packages  was  overwhelming.  It 
is  likely  that  some  recipients  were  deterred  or 
discouraged  by  what  may  have  seemed  to  be 
an  indiscriminate  and  sometimes  contradictory 
array  of  statistics  of  differing  but  uncertain 
reliability  zuid  meaning.  A smaller  number  of 
more  fully  described  indices  might  have  been 
more  useful  to  centers. 

Another  consideration  about  feedback  is  the 
psychological  importance  of  timeliness,  even 
when  there  is  little  real  import.  NIMH's  com- 
parative data  for  centers  were  often  provided 
several  years  after  the  CMHCs  reported  to 
NIMH.  However,  since  little  change  occurred 
in  the  distribution  of  the  population  of  CMHCs 
from  one  year  to  the  next,  the  distributions 
for  all  centers  from  prior  years  were  quite 
jidequate  for  centers  to  use  to  interpret  their 
own  current  standing  on  particular  indices. 
Continuing  complaints  from  CMHCs  about  the 


delay  in  providing  comparative  data  showed 
the  difficulty  in  educating  users  to  practical 
uses  of  the  comparative  data. 

Studies  of  management  information  systems 
that  would  improve  the  meaningfulness  of  in- 
dicators include: 

1.  Year-to-year  reliability  assessments  of 
indices  from  local  facilities.  These  as- 
sessments should  be  made  for  sets  of 
facilities  that  are  not  thought  to  have 
changed  much  in  relative  standing.  In 
addition,  changes  in  indices  should  be 
examined  for  facilities  expected  to  have 
changed  in  ways  related  to  the  indices. 

2.  Relationships  between  indices.  For  ex- 
ample, volume  of  service  may  be  meas- 
ured by  problems  or  episodes  of  care, 
etc.  Knowledge  of  the  relationships 
among  these  indices  for  any  type  of 
service  would  be  important  in  deter- 
mining their  interchangeability  and 
whether  more  than  one  index  should  be 
used.  These  different  ways  to  mezisure 
volume  also  may  differ  in  other  char- 
acteristics, such  as  eeise  of  routine  re- 
cording, reliability,  susceptibility  to 
manipulation  by  service  staff,  and  ver- 
ifiability by  outsiders. 

3.  Ethnographic  studies  of  the  processes  of 
data  collection,  aggregation,  and  re- 
porting in  service  agencies.  These  stud- 
ies would  help  to  identify  the  vulner- 
abilities of  indices  to  distortion  under 
various  conditions,  the  meaning  that  can 
safely  be  given  to  indices,  and  the 
safeguards  that  should  be  applied  in  the 
continued  use  of  indices. 

4.  Determining  the  trade-offs  between 
negative  features  of  assessments,  such 
as  costs  and  burden  an  staff  and  pa- 
tients, and  such  positive  features  as 
amount,  accuracy,  reliability,  and  time- 
liness of  information.  While  there  has 
been  much  complaint  about  the  burden 
of  data  reporting,  it  may  be  that  the 
degree  to  \^ch  those  who  are  called  on 
to  report  are  in  control  of  the  uses  to 
which  the  reports  are  put  may  be  a more 
crucial  issue. 

5.  Field  tests  of  client-developed  and 
client-managed  or  public-managed  sat- 
isfaction or  complaint  measures  and 
procedures  (Windle  and  Paschall  1981). 


8 


6.  Assessment  of  distortions  from  self- 
reporting.  Many  national  reporting  sys- 
tems are  used  primarily  to  obtain  in- 
formation about  the  national  program  as 
a whole  and  only  secondarily  as  feedback 
to  the  local  units  or  for  use  in  oversight. 
Use  of  information  systems  for  grant 
monitoring  changes  the  incentives  in 
local  units  for  providing  accurate  in- 
formation. When  funding  decisions  are 
based  in  part  on  these  data,  a system  of 
checks  or  audits  might  be  necessary. 

To  continue  our  illustration  of  measuring 
equity  of  mental  health  services  to  ethnic 
minorities,  the  specific  indicators  briefly  de- 
scribed in  our  discussion  of  element  two  of  the 
model  would  need  to  be  incorporated  into 
existing  or  planned  data  collection  s5rstems  in 
ways  to  yield  valid  and  reliable  measurement 
of  these  indices.  The  ethnic  status  of  service 
recipients  would  need  to  be  observed  and  re- 
corded in  a way  consistent  with  the  clsissi- 
fications  available  for  the  entire  population 
from  the  U.S.  census.  In  the  1970  U.S.  census, 
ethnic  status  and  Spanish  heritage  were  re- 
corded separately.  If  one  wants  a mecisure  of 
ethnic  minorities,  rather  than  nonwhites,  ad- 
justments to  the  U.S.  census  data  are  needed. 
Some  check  on  the  consistency  of  ethnic 
clzissifications  in  any  local  agency  may  be  ad- 
visable. Such  checks  will  be  especially  im- 
portant when  estimates  are  used,  or  if  a high 
proportion  of  unknowns  is  involved. 

A System  for  Use  of  Performance  Measures 

Elmore  (1978)  suggested  that  failure  in 
program  implementation,  i.e.,  in  the  trans- 
lation of  policies  into  administrative  action, 
may  play  a large  role  in  the  failure  of  public 
programs.  He  described  four  organizational 
models  of  social  program  implementation: 

The  s5rstems  management  model  treats 
organizations  as  value-maximizing  units 
and  views  implementation  as  an  ordered, 
goaldirected  activity.  The  bureaucratic 
process  model  empheisizes  the  roles  of 
discretion  and  routine  in  organizational 
behavior  and  views  implementation  as  a 
process  of  continually  controlling  dis- 
cretion and  changing  routine.  The  orgein- 
izational  development  model  treats  the 
needs  of  individuals  for  participation  zind 
commitment  as  paramount  and  views  im- 
plementation as  a process  in  which  im- 
plementors shape  policies  and  claim  them 


2is  their  own.  The  conflict  and  bargaining 
model  treats  organizations  as  arenas  of 
conflict  and  views  implementation  as  a 
bargaining  process  in  which  the  partici- 
pants converge  on  temporary  solutions  but 
no  stable  result  is  ever  reached. 

Each  emphasizes  different  features  of  the 
implementation  process,  suggesting  different 
waj^  to  use  the  results  of  performance  mezis- 
ures.  The  systems  management  model  is  in  the 
mainstream  of  rational  tradition,  and  suggests 
that  each  program  manager  would  manage 
better  by  having  more  meaningful  data 
available.  Since  it  zissumes  all  managers  want 
to  act  rationally  to  attain  program  goals,  this 
view  prompts  using  performance  data  educa- 
tionally with  program  memagers. 

The  bureaucratic  process  model  takes  the 
perspective  of  outsiders  who  wish  to  control 
the  behavior  of  the  program  staff  to  make 
sure  the  staff  follows  the  formal  prescribed 
goals  of  the  program.  This  is  the  model  the 
private  sector  decries  as  incentive-killing, 
inappropriately  rigid,  government  intervention 
into  local  and  private  functioning.  Yet,  as 
program  evaluations  reveal  discrepancies  be- 
tween alleged  formal  program  goals  and  actual 
practice  by  agencies  accepting  public  funds 
directly  or  indirectly,  government  efforts  at 
reform  move  toward  limiting  agencies'  free- 
dom to  violate  formal  expectations.  Thus, 
governments  have  been  primary  advocates  of 
accountability  systems. 

The  organizational  development  model 
recognizes  that  the  imposition  of  legal  re- 
quirements is  unlikely  to  work  without  the 
acceptance  and  cooperation  of  the  program 
staff.  This  cooperation  requires  that  the  staff 
feel  that  the  requirements  are  reasonable,  and 
that  they  take  into  account  the  values  of  the 
staff.  This  legitimation  is  greatly  facilitated 
by  the  involvement  of  local  agency  staff  or  its 
representatives.  Davis  (1973)  proposed  a model 
of  planned  change,  which  conditions  the 
changes  to  be  made  upon  the  acceptance  and 
other  characteristics  of  those  to  be  involved  in 
the  change.  Considering  the  needs  of  local 
staff  and  establishing  a standardized  account- 
ability system  are  likely  to  be  inconsistent. 
This  inconsistency  may  be  minimized  by  in- 
volving local  staff  members  in  planning  for  the 
accountability  system  and  making  them  cus- 
todieins  of  its  use.  The  type  of  use  most  con- 
sistent with  this  model  is  to  have  peer  review 
of  results  of  the  accountability  system  aiming 
for  the  education  and  further  development  of 
the  staff. 


9 


The  conflict  and  bargaining  model  implies 
that  implementing  a performance  meaisure- 
ment  system  is  a strategic  act  to  shape  the 
behavior  of  others.  This  is  much  the  same  as  in 
the  bureaucratic  process  model,  except  that 
the  bargaining  model  puts  less  emphasis  on  the 
specific  content  of  the  system  and  anticipates 
countermoves  by  local  agency  staff  to  use  the 
performance  measvirement  system  to  exact 
increzised  resources,  autonomy,  and  status. 

An  agency  might  . . . put  a great  deal  of 
effort  into  developing  an  elaborate  col- 
lection of  rules  and  regulations  or  an 
elegant  system  of  management  controls, 
knowing  full  well  that  it  doesn't  have  the 
resources  to  make  them  binding  on  other 
actors.  But  the  expectation  that  the  rules 
might  be  enforced  is  sufficient  to  in- 
fluence the  behavior  of  other  actors.  The 
important  fact  is  not  whether  the  rules 
are  enforced  or  not,  but  the  effect  of 
their  existence  on  the  outcome  of  the 
bargaining  process.  (Elmore  1978,  pp. 
220-221) 

It  would  be  expected  that  the  threat  of 
government  action  on  performance  accounta- 
bility would  make  local  agents  temporarily 
- . more  compliant  in  general,  even  if  not  neces- 
sarily on  the  specific  issues  performance 
measures  use.  Implementation,  for  this  pur- 
pose, should  be  done  in  a way  to  maximize 
anticipation  by  the  local  units  of  the  discre- 
tionary power  of  the  enforcing  agents,  to 
preserve  flexibility  in  later  enforcement,  and 
to  counter  the  expected  counterattack  by  the 
local  agencies.  As  Elmore  pointed  out,  there  is 
some  applicability  for  each  of  these  four 
models. 

Bjorkmzin  and  Altenstetter  (1979)  outlined  a 
similar  typology  for  accountability  approaches 
and  recommended  for  health  care  that  it  may 
be  best  to  use  and  integrate  multiple  types  of 
accountability.  Different  uses  of  performance 
meaisures  may  need  to  be  staggered  in  time, 
following  a developmental  sequence.  At  the 
outset,  there  will  be  difficulties  in  obtaining 
information,  ensuring  its  conformity  to  stzin- 
dard  definitions,  and  getting  the  informed  co- 
operation of  local  agency  staff.  Initially, 
therefore,  program  measures  should  be  used 
only  for  service  agency  staff  education  and 
internal  use  and  not  for  aggregate  national 
program  evaluation  or  for  policy  or  budget 
decisions  by  outside  offices.  Later,  when  the 
system  has  evolved  to  a point  of  stability,  its 
data  are  accepted  as  meaningful,  and  use  of 


data  is  recognized  ais  appropriate,  contractual 
agreements  might  be  maide  between  funding 
agencies  and  service  providers  to  set  reason- 
able levels  of  performance  on  each  indicator 
expected  for  individual  programs  of  differing 
types.  These  expected  performance  levels 
might  be  in  the  form  of  absolute  scores  based 
on  experts'  judgments,  relative  scores  based 
on  frequency  distributions  of  operating  agen- 
cies, or  change  scores  based  on  projected  im- 
provements from  lower  to  higher  levels  of 
agency  development. 

Eventually,  oversight  agencies  might  set 
minimum  levels  of  performance  on  indicators 
for  all  service  agencies  and  negotiate  specific 
goals  for  each  local  program.  Minimum  levels 
of  performance  might  be  specified  for  groups 
of  programs  serving  similar  populations  in 
similar  ways.  Exemplary  levels  of  performance 
or  of  change  in  performance  also  might  be  set 
to  allow  oversight  agencies  to  reward  service 
agencies  for  unusvially  good  functioning. 
McAuliffe  (1978)  has  suggested  that  agency 
standards  might  be  set  by  using  statistics  for 
which  the  percentage  of  aidequate  care  heis 
been  predetermined  by  examining  the  process 
of  care.  Possibilities  are  numerous,  but  no 
rigid  specification  of  uses  is  desirable  without 
wide  participation  and  testing  to  ensxire 
fairness  and  benefits  to  patients,  providers, 
and  the  public. 

The  usefulness  of  performance  measure- 
ments to  oversight  agencies  is  clear,  but  what 
of  other  potential  users?  Local  agency  staff 
and  management  might  use  performance  data 
for  self-evaluation.  By  comparing  the  per- 
formzince  of  their  own  program  with  that  of 
similar  programs,  they  might  detect  oppor- 
tunities to  improve  their  services.  Some  per- 
formance measurements  coiald  be  used  by 
professional  groups  to  judge  whether  service 
agencies  are  maintaining  minimum  profes- 
sional standzirds. 

The  major  goal  of  performance  mezisure- 
ment  is  to  improve  services  by  increasing  ac- 
countability to  the  public.  Data  from  a per- 
formance mezisurement  system  would  enh^ce 
the  sharing  of  information  with  the  public  and 
consequently  enhance  the  ability  of  citizen 
groups  to  understand  and  evaluate  their  local 
mental  health  programs.  With  reliable  data, 
rather  than  opinions  and  impressions,  these 
groups  can  reach  better  decisions  about  serv- 
ice agencies  and  rationalize  their  decisions 
more  convincingly. 

To  fulfill  all  these  functions,  performance 
meaisures  appropriate  to  each  interest  group 
need  to  be  available.  A system  might  be  set  up 


10 


to  ensure  that  each  group  receives  the  infor- 
mation it  needs  to  make  informed  decisions 
and  set  policies,  and  to  ensure  that  perform- 
ance measures  are  used  in  grant-funding  de- 
cisions by  the  Federal  zind  State  Governments. 
Ways  to  establish  management  s3rstems  using 
performance  measures  need  to  be  thoroughly 
studied  and  carefully  planned.  Cooperative 
studies  of  the  impacts  on  services  of  using 
performance  mezisurement  information  in 
various  ways  are  desirable.  To  some  extent, 
States  or  regions  that  choose  to  use  different 
oversiglit  approaches  offer  natural  quasi- 
experiments, and  when  these  differences  in 
approaches  are  sufficiently  clear,  comparable 
data  on  the  results  on  services  could  be 
collected. 

Research  is  needed  also  on  the  impact  of 
professional  self-regulation,  such  as  that  done 
by  the  Joint  Commission  on  the  Accreditation 
of  Hospitals.  What  citizen  groups  can  do  with 
sjrstematic  data  about  local  programs,  and  how 
they  might  do  so,  also  warrant  study,  both 
through  careful  documentation  of  specially 
supported  and  natural  demonstrations  and  by 
comparative  studies.  An  accountability  system 
that  would  give  the  necessary  data  and  re- 
sponsibility for  oversight  and  reinforcement  to 
local  citizens  and  communities  might  have 
more  potential  than  remote  governments  to 
shape  service  providers'  behavior  construc- 
tively because  of  the  greater  possibility  for 
face-to-face  control  through  positive  rein- 
forcement (Skinner  1977). 

The  question  of  standards  needs  careful, 
thorough,  and  wide  consideration.  If  stzindards 
are  defined  as  cutoff  levels  on  some  dimen- 
sions reflecting  adequacy  of  performance,  the 
great  differences  in  client  populations  and 
treatment  approaches  among  service  providers 
may  make  such  definitions  inequitable  and 
dzingerous.  Standards  may  be  inappropriate, 
stifle  change,  be  hard  to  get  agreement  on,  be 
hard  to  apply  flexibly  to  individual  cases,  ex- 
ceed technology,  and  even  decreeise  competi- 
tion and  innovation.  As  seems  often  to  occur 
with  regulation,  standards  may  benefit  the 
regulated  agencies  rather  than  the  public  and 
do  so  at  a high  ultimate  cost  to  the  public. 

The  imposition  of  standards  as  a beisis  for 
funding  decisions  may  make  service  agency 
managers  discontent.  While  standards  may 
benefit  service  providers  that  conform  to 
traditional  method  or  deal  with  easy  prob- 
lems, they  may  be  a threat  to  providers  using 
innovative  methods  or  treating  chronic  or 
difficult  problems.  They  may  inappropriately 
empheisize  utilitarian  efficiency  at  the  ex- 


pense of  humanitariamism  and  justice  (House 
1976).  The  value  from  the  use  of  a perform- 
ance measurement  s3rstem  depends  on  intel- 
ligent use  of  information  in  pursuit  of  the 
program's  goals.  The  use  will  call  for  creative 
acts  by  both  service  agencies  aind  government 
bureaucrats  responsible  for  program  oversight. 

In  terms  of  the  example  used  here,  ethnic 
minority  utilization  of  services  should  not  be 
put  in  the  form  of  an  independent,  rigid  per- 
formance stsmdard.  The  level  of  minority 
utilization  appropriate  for  each  service  agency 
depends  on  the  services  offered  by  other 
agencies  2ind  practitioners  in  the  catchment 
area  and  the  size  and  conditions  of  the  mi- 
nority group.  The  difficulty  of  placing  a clear 
judgment  on  the  index  of  relative  utilization 
rate  for  minorities  makes  its  most  appropriate 
use  a soft,  reversible  one,  such  as  screening  to 
identify  agencies  that  need  technical  assist- 
ance or  incentives  to  increaise  the  accessibility 
of  services  to  minorities,  or  that  should  study 
what  appears  to  be  an  ineidequacy  to  deter- 
mine v^at,  if  any,  actions  should  be  taken. 
This  index  might  be  used  also  to  establish  an- 
nual objectives  for  specific  agencies  and  to 
measure  achievement  of  these  objectives. 

Just  as  overly  zealous  use  of  this  or  any 
performance  meaisure  czm  harm  services, 
damage  also  results  from  ineffective  over- 
sight. NIMH  and  DHHS  Regional  Offices  un- 
dertook a small  effort  to  improve  CMHCs  that 
had  low  relative  minority  utilization  rates, 
using  an  educational  approach.  CMHCs  were 
informed  of  their  relative  standing  on  this 
statistic  and  requested  to  describe  their  plans 
to  remedy  the  situation  if  they  agreed  it  was 
inappropriate,  eis  almost  all  did.  Followup  ex- 
amination of  this  index  in  these  centers  re- 
vealed no  change  beyond  what  is  expected 
from  statistical  regression  on  a group  chosen 
for  their  extreme  scores  (Windle  and  Wu  1981). 
Thus,  what  was  agreed  to  be  am  inappropriate 
condition  was  allowed  to  persist  through  lack 
of  sufficiently  rigorous  action. 


Hopes  and  Forebodings 

The  concept  of  program  performance  meas- 
urement is  testimony  to  our  persistent  faith  in 
science  and  rationality  to  guide  society.  To 
some  extent,  this  faith  serves  a symbolic 
function,  identifying  an  ideal  many  believe 
should  be  aimed  for,  even  while  recognizing 
that  we  fall  fair  below  the  ideal. 

When  we  try  to  translate  ideads  into  pro- 
grams, we  need  to  remain  awame  of  practicad 


11 


problems.  At  least  three  types  of  problems 
arise  in  implementing  a performance  meas- 
urement system:  (1)  technological/economic, 
(2)  political,  and  (3)  moral.  Technologically, 
we  lack  the  knowledge  of  how  to  measure 
many  aspects  of  programs  or  how  to  establish, 
at  resisonable  cost,  reliable  management  in- 
formation systems  to  provide  routine  data 
accurately.  Developing  such  technology  will 
take  time  and  funds  and  may  not  have  high 
priority. 

Politically,  there  is  inconsistency  in  the 
assumption  that  at  the  present  time  in  the 
United  States  service-providing  organizations 
and  industries  can  easily  be  made  to  provide 
information  that  would  weaken  their  own 
power.  Alford's  (1975)  description  of  the  sym- 
biotic relationship  in  health  care  between 
regulators  and  the  provider  industries,  orga- 
nizations, and  practitioners  reveals  the  largely 
symbolic  nature  of  many  reform  efforts.  Polls 
(e.g..  Union  Carbide  Corporation  1980)  suggest 
that  the  public  currently  feels  that  govern- 
ment regulation  has  more  negative  than  pos- 
itive consequences.  This  widespread  distrust  of 
government  undermines  government  efforts  to 
establish  effective  accountability  S5rstems. 

Morally,  the  focus  on  efficiency  and  cost- 
savings,  even  though  accompanied  by  the  ra- 
- tionale  of  science  and  business  know-how,  runs 
partly  counter  to  humanitarian,  democratic 
participation,  and  distributive  justice  values. 

A focus  on  measurable  features  leads  to  a 
neglect  of  subtle,  difficult-to-identify,  or 
controversial  criteria.  Most  agreement  is 
likely  on  economic  and  physical  items,  such  as 
cost  and  amount  of  service  provided.  Less 
agreement  can  be  expected  on  psychological 
characteristics,  such  as  clients'  satisfaction 
(which  are  difficult  to  measure),  and  on  so- 
ciological and  political  characteristics,  such  as 
distribution  of  power  (which  are  controver- 
sial). Well-defined  goals  are  harder  to  estab- 
lish in  client-centered  organizations  than  in 
hierarchically  organized  agencies  or  those 
oriented  toward  the  production  of  goods 
(Sjoberg  1975;  Kelly  1980).  This  greater  com- 
patibility of  mezisurement  approaches  with 
hierarchical  structures  leads  to  evaluation's 
being  oriented  toward  and  accepting  the 
structural  constraints  of  the  system  being 
evaluated. 

Some  biases  toward  acceptance  of  the 
status  quo  are  unconscious,  such  as  common 
assumptions  about  the  scarcity  of  goods  and 
services  and  identification  of  troubled  clients 
rather  than  society  as  the  seat  of  problems 
that  need  to  be  changed.  Other  biases  result 


from  strategic  or  personal  choices  to  work 
with  those  with  most  power  in  order  to  im- 
prove the  chamce  that  results  will  be  imple- 
mented. As  Rule  (1978)  pointed  out,  social 
science  knowledge  benefits  most  those  who 
possess  that  knowledge.  Internally  operated 
program  evaluation  will  therefore  bring  the 
most  benefit  to  program  staff,  and  will  do  so 
most  when  these  staff  specify  the  types  of 
information  to  be  developed.  The  choice  of 
shaping  evaluations  to  benefit  certain  groups 
more  than  others  is  a value  preference.  When 
these  groups  differ  greatly  in  well-being,  as 
care  providers  and  people  in  need  of  hioman 
services  do,  the  choice  might  be  considered  an 
ethical  one  dealing  with  distributive  justice 
(Sjoberg  1975;  House  1976). 

We  have  tried  to  suggest  that  many  doubtful 
assumptions  and  practical  problems  zuise  in 
implementing  a performance  measurement 
accountability  system.  Increased  awareness 
and  debate  over  the  assumptions  is  needed, 
and  solutions  to  the  practical  problems  will 
take  much  research  and  broad  participation  of 
those  whose  cooperation  is  needed. 

Many  programs  seem  to  be  trying  perform- 
ance measurement  based  on  narrow  partici- 
pation and  without  initial  research  to  validate 
mezisures  or  procedures.  It  is  likely  that  the 
major  benefit  of  these  endeavors  \^1  be  not 
the  hoped-for  improvements  within  the  pro- 
grams to  which  they  are  applied,  but  rather 
whatever  lessons  about  meeisurement  and 
management  can  be  learned  from  these  ex- 
periences for  future  application.  It  is,  there- 
fore, important  to  examine  individual  pro- 
grams' attempts  to  implement  performance 
measurement  from  a perspective  of  awareness 
of  the  state  of  the  arts  of  measurement  and 
management. 

References 

Alger,  I.  Accountability:  Human  zmd  political 
dimensions.  American  Journal  of  Orthopsy- 
chiatry 50:388-393,  1980. 

Alford,  R.R.  Health  Care  Politics:  Ideological 
and  Interest  Group  Barriers  to  Reform.  Chi- 
cago: University  of  Chicago  Press,  1975. 
Becker,  W.M.  Method  for  establishing  per- 
formance criteria  through  use  of  expert 
panels-  An  innovation.  Psychological  Re- 
ports 44:  1247-1251,  1979. 

Bernstein,  I.N.,  and  Freeman,  H.E.  Academic 
and  Entrepreneurial  Research:  The  Conse- 
quences of  Diversity  in  Federal  Studies. 
New  York:  Russell  Sage  Foundation,  1975. 


12 


Bjorkman,  J.W.,  and  Altenstetter,  C.  Account- 
ability in  health  care:  An  essay  in  mechein- 
isms.  muddles  and  mires.  Journal  of  Health 
Politics,  Policy  and  Law  4:360-3Sl,  1979. 

Bonner,  J.T.;  Duncan,  J.W.;  Goldstein,  H.;  and 
Hagan,  R.L.  Policy  relevance  and  the  integ- 
rity of  statistics.  Statistical  Reporter  80: 
64-69,  1980a. 

Bonner,  J.T.,  zind  others.  Improving  the  Fed- 
eral statistical  system.  Report  of  the  Presi- 
dent's Reorganization  Project  for  the 
Federal  Statistical  S3TStem.  Statistical  Re- 
porter 80:  197-212,  1980b. 

Davis,  H.R.  Change  and  innovation.  In:  Feld- 
man, S.,  ed.  Administration  in  Mental 
Health.  New  York:  Thomas,  1973.  pp. 
289-341. 

Delbecq,  A.L.;  Van  de  Ven,  A.H.;  and  Gustaf- 
son, D.H.  Group  Techniques  for  Program 
Planning:  A Guide  to  Nominal  Group  and 
Delphi  Processes.  New  York:  Scott,  Fores- 
mzin,  1975. 

Edelman,  M.  Political  Language:  Words  That 
Succeed  and  Policies  That  Fail.  New  York: 
Aczidemic  Press,  1977. 

Elmore,  R.F.  Organizational  models  of  social 
program  implementation.  Public  Policy  26: 
185-228,  1978. 

Eyman,  R.K.,  and  Windle,  C.  Comparative 
approaches  to  program  evaluation.  In:  Siva 
Sankar,  D.V.,  ed.  Mental  Health  in  Children. 
Vol.  III.  Westbury,  N.Y.:  PJD  Publications, 
1976.  pp.  639-659. 

Flaherty,  E.W.,  and  Morell,  J.A.  Evaluation: 
Manifestations  of  a new  field.  Evaluation 
and  Program  Planning  1:1-10,  1978. 

Fontane,  P.E.  Improving  program  evaluation 
with  reciprocal  indicators.  Social  Indicators 
Research  2:211-221,  1975. 

Gam,  H.A.;  Flax,  M.J.;  Springer,  M.;  and  Tay- 
lor, J.B.  Models  for  Indicator  Develop- 
ment: A Framework  for  Policy  Analysis. 
Washington,  D.C.:  The  Urban  Institute,  1976. 

Gleiss,  G.V.,  and  Ellett,  F.S.,  Jr.  Evaluation 
research.  In:  Rosenzweig,  M.R.,  and  Porter, 
L.W.,  eds.  Annual  Review  of  Psychology. 
Palo  Alto,  Calif.:  Annual  Reviews,  Inc., 
1980.  pp.  211-228. 

Greer,  S.;  Hedlund,  R.D.;  and  Gibson,  J.L.  In- 
troduction: The  accountability  of  insti- 
tutions in  urban  society.  In:  Greer,  S.; 
Hedlund,  R.D.;  and  Gibson,  J.L.,  eds.  Ac- 
countability in  Urban  Society:  Public 

Agencies  Under  Fire.  Beverly  Hills,  Calif.: 
Sage,  1978.  pp.  9-12. 

Gutek,  B.A.  Strategies  for  studying  client 
satisfaction.  Journal  of  Social  Issues  34:44- 
56,  1978. 


House,  E.R.  Justice  in  evaluation.  Evaluation 
Studies  1:75-100,  1976. 

Ingram,  L.C.  Broken  frame  emd  emergent 
meaning:  The  case  of  student  evaluation  of 
teaching.  Human  Relations  32:803-818,  1979. 

Institute  for  Social  Research.  Deepening  Dis- 
trust of  Political  Leaders  Is  Jarring  Public's 
Faith  in  Institutions.  ISR  Newsletter, 
Autumn  1979. 

Kelly,  R.M.  Ideology,  effectiveness,  and  public 
sector  productivity:  With  illustrations  from 
the  field  of  higher  education.  Journal  of  So- 
cial Issues  36:76-95,  1980. 

Keppler-Seid,  H.;  Windle,  C.;  and  Woy,  R.J. 
Performance  measures  for  mental  health 
programs:  Something  better,  something 

worse  or  more  of  the  same?  Community 
Mental  Health  Journal  16:217-234,  1980. 

Landsberg,  G.;  Neigher,  W.D.;  Hammer,  R.J.; 
Windle,  C.;  zind  Woy,  J.R.  Evaluation  in 
Practice.  DHEW  Pub.  No.  (ADM)78-763. 
Washington,  D.C.:  U.S.  Govt.  Print.  Off., 
1979. 

McAuliffe,  W.E.  On  the  statistical  validity  of 
standards  used  in  profile  monitoring  of 
health  care.  American  Journal  of  Public 
Health  68:  645-651,  1978. 

McAuliffe,  W.E.  Measuring  the  quality  of 
medical  care:  Process  versus  outcome.  Mil- 
bank  Memorial  Fund  Quarterly  57:118-152, 
1979. 

National  Institute  of  Mental  Health.  1973  Pro- 
file for  Federally  Funded  Community  Men- 
tal Health  Centers.  Rockville,  Md.:  the 
Institute,  1973. 

Neigher,  W.,  and  others.  "The  Last  Program 
Evaluation  Conference."  Panel  of  presenta- 
tions at  the  National  Council  of  Community 
Mental  Health  Centers'  annual  meeting. 
New  York,  March  1982. 

Parloff,  M.B.  Can  psychotherapy  research 
guide  the  policymaker?  A little  knowledge 
may  be  a dangerous  thing.  American  Psy- 
chologist 34:  296-306,  1979. 

Pottinger,  P.S.  "Defining  Competence  in  the 
Mental  Health  Professions."  Presentation  at 
American  Psychological  Association  Con- 
vention, New  York,  Sept.  1979. 

Rosen,  B.M.;  Lawrence,  L.;  Goldsmith,  H.F.; 
Shambaugh,  J.P.;  and  Windle,  C.  Mental 
Health  Demographic  Profile  System  De- 
scription: Purpose,  Content,  and  Sampler  of 
Uses.  National  Institute  of  Mental  Health, 
DHEW  Pub.  No.  (ADM)79-263.  Washington, 
D.C.:  U.S.  Govt.  Print.  Off.,  1975. 

Rossi,  P.H.  Issues  in  the  evaluation  of  human 
services  delivery.  Evaluation  Quarterly  2: 
573-599,  1978. 


13 


Rule,  J.B.  Insight  and  Social  Betterment:  A 
Preface  to  Applied  Social  Science.  New 
York:  Oxford  University  Press,  1978. 

Schick,  A.  From  analysis  to  evaluation.  Annals 
of  the  American  Academy  of  Political  and 
Social  Science  394:57-71,  1971. 

Schmidt,  S.W.;  Guasti,  L.;  Lande,  C.H.;  and 
Scott,  J.C.  Friends,  Followers  and  Factions: 
A Reader  in  Political  Clientelism.  Berkeley: 
University  of  California  Press,  1977. 

Sjoberg,  G.  Politics,  ethics  and  evaluation  re- 
search. In:  Guttentag,  M.,  and  Struening, 
E.L.,  eds.  Handbook  of  Evaluation  Research. 
Vol.  2.  Beverly  Hills,  Calif.:  Sage,  1975.  pp. 
29-51. 

Skinner,  B.F.  Between  freedom  and  despotism. 
Psychology  Today,  Sept.  1977. 

Union  Carbide  Corporation.  The  Vital  Con- 
sensus: American  Attitudes  on  Economic 
Growth.  New  York:  the  Corporation,  1980. 

Wertheimer,  M.,  and  others.  Psychology  and 
the  future.  American  Psychologist  33:631- 
647,  1978. 

Wholey,  J.S.  Evaluation:  Promise  and  Per- 

formance. Wzishington,  D.C.:  The  Urban 
Institute,  1979. 

Wholey,  J.S.;  Nay,  J.N.;  Scanlon,  J.W.;  and 
Schmidt,  R.E.  Evaluation:  When  is  it 

needed?  Evaluation  2(2):89-93,  1975. 

Williams,  W.,  and  Elmore,  R.F.  Social  Program 
Implementation.  New  York:  Academic 


Press,  1976. 

Windle,  C.  Developmental  trends  in  program 
evaluation.  Evaluation  and  Program  Plan- 
ning 2:193-196,  1979. 

Windle,  C.;  Neal,  J.;  and  Zirm,  H.K.  Stimulat- 
ing equity  of  services  to  non-whites  in 
CMHCs.  Community  Mental  Health  Journal 
15:155-166,  1979. 

Windle,  C.,  and  Neigher,  W.  Ethical  problems 
in  program  evaluation:  Advice  for  trapped 
evaluators.  Evaluation  and  Program  Plan- 
ning 1:97-107,  1978. 

Windle,  C.,  and  Paschall,  N.C.  Client  par- 
ticipation in  community  mental  health 
center  program  evaluation:  Increasing  inci- 
dence; inadequate  involvement.  Commu- 
nity Mental  Health  Journal  17:66-76,  1981. 

Windle,  C.,  and  Woy,  J.R.  When  to  apply  var- 
ious program  evaluation  approaches.  Evalu- 
ation 4:35-37,  1977. 

Windle,  C.,  emd  Wu,  I.  Stimulating  equity  of 
CMHC  services  to  non-whites:  A follow-up 
based  on  expected  regression.  Community 
Mental  Health  Journal  17:306-309,  1981. 

World  Health  Organization.  Statistical  Indi- 
cators for  the  Planning  and  Evaluation  of 
Public  Health  Programmes.  Geneva:  the  Or- 
ganization, 1971. 

Yankelovitch,  D.,  and  Kaagan,  L.  Proposition 
13  one  year  later:  What  it  is  and  what  it 
isn't.  Social  Policy  10(3):19-23,  1979. 


14 


Part  I 

The  State  of  the  Art  of  Measurement 


Introductory  Comments 

The  most  crucial  aissumption  of  program 
performance  measurement  is  that  important 
cheu-acteristics  of  how  a program  or  agency 
operates  can  be  measured  with  reliability  and 
validity.  This  assumption  includes  both  the 
technical  and  the  political  feasibility  of 
making  accurate  measurements  of  program 
functioning.  Closely  related  to  political  fea- 
sibility is  another  eissumption  that  usually  mo- 
tivates the  initiation  of  performance  meas- 
urement systems— namely,  that  benefits  will 
ensue  from  such  mezisurements  because  the 
program  will  operate  more  effectively,  more 
responsively,  or  more  efficiently.  Each  of 
these  assumptions,  of  course,  may  apply  only 
to  certain  types  of  performance,  to  certain 
types  of  programs,  to  certain  t5rpes  of  organi- 
zations, and  within  certain  types  of  political, 
economic,  or  cultural  contexts.  Further,  the 
benefits  from  performzince  mezisurement  may 
be  either  direct  or  indirect.  The  most  usual 
expectation  is  that  the  information  gained 
from  measurements  will  improve  the  func- 
tioning of  the  organization  or  enable  others 
outside  the  organization  to  act  more  wisely 
with  respect  to  it. 

The  technical  issues  of  feeisibility  of  ac- 
curate measurement  generally  are  the  primary 
concerns  of  measurement  specialists,  who 
usually  eissume  that  the  more  precise  and  ex- 
tensive the  mezisurements,  the  better  the 
social  consequences.  Conversely,  issues  sur- 
rounding the  desirability  of  having  a mezisure- 
ment  depend  on  how  managers  and  others  will 
use  the  information  and  thus  are  usually  con- 
sidered the  province  of  management.  The 
precision  of  measurements,  however,  may 
change  the  type  of  management  that  seems 
feasible. 


Construction  of  Performance  Measures 

Warner  and  Holloway  (1978)  have  given  a 
straightforward  emd  simple  description  of 


seven  steps  involved  in  construction  of  per- 
formance mezisures. 

1.  Decide  the  objectives  of  the  system 
whose  performance  is  being  measured. 

2.  Identify  the  system's  attributes,  which 
define  the  system's  objectives. 

3.  Develop  the  counting  scheme  for  ob- 
serving these  attributes. 

4.  Scale  the  preference  for  various  counts 
or  the  relative  importance  of  each  at- 
tribute for  achieving  the  objectives. 

5.  Calculate  a score  for  the  system  using 
the  numbers  obtained  in  (4). 

6.  Test  the  score's  reliability. 

7.  Obtain  evidence  for  the  score's  content 
validity. 

The  first  of  these  steps,  choosing  objectives, 
Tisually  is  regarded  as  the  domain  of  program 
managers  rather  than  evaluators  or  re- 
searchers who  are  likely  to  be  assigned  the 
task  of  designing  a performance  meeisurement 
S3TStem.  This  critical  task,  of  focusing  per- 
formance meeisures  on  the  program's  goals  was 
treated  in  the  introduction.  The  meeisurement 
person's  tzisk  begins  more  properly  at  the 
second  step,  identifying  S5rstem  attributes  that 
define  the  objectives.  This  teisk  is  fundamental 
for  content  validity.  Warner  and  Holloway 
suggest  three  techniques  for  zirriving  at  sys- 
tem attributes— brainstorming,  the  nominal 
group  technique,  and  the  delphi  technique. 

Brainstorming  is  a group  activity  of  pro- 
ducing imaginative  ideas  through  mutual 
stimulation  2ind  then  reviewing  the  pool  of 
ideas  to  eliminate  duplication  and  agree  on 
which  ideas  to  use.  Warner  and  Holloway  say 
that  brainstorming  is  the  most  common  tech- 
nique but  is  less  desirable  them  the  nominal 
group  and  delphi  approaches,  which  force  the 


15 


group  to  withhold  judgment  while  attributes 
are  being  generated  and  eliminate  status 
differentials  between  group  members.  The 
nominal  group  technique  is  so  called  because 
individuals  in  a group  are  forced  to  work  sep- 
arately, rather  than  collectively,  to  generate 
ideeis.  Ideas  are  then  listed,  discussed,  and 
subsequently  rank- ordered  by  each  group 
member  separately.  The  group  then  decides 
upon  a cutoff  point  in  the  aggregate  rank  for 
the  ideas. 

The  delphi  method  is  similar  to  the  nominal 
group  technique,  except  that  the  group  never 
meets.  After  ideas  have  been  generated  and 
rated  by  group  members  individually,  each 
member  receives  a summary  of  the  ratings  of 
the  total  group  and  is  zisked  to  again  make 
ratings.  Convergence  in  judgments  tends  to 
occur. 

Systems  for  counting  the  attributes  to  be 
used  in  mezisurement  must  be  accurate  but  not 
costly  or  obtrusive  and  should  not  distort  the 
attribute  being  measured.  Scaling  rules,  which 
have  to  do  with  assigning  rezisonable  numbers 
to  observations,  are  not  well  known  by  zidmin- 
istrators.  Sometimes  the  count  of  an  attribute 
itself  provides  the  scale  by  which  to  describe 
it.  At  other  times,  however,  some  of  the  num- 
bers in  the  count  may  need  to  be  grouped  to 
reflect  meaningful  distinctions.  For  example, 
a program  that  fails  to  meet  a given  re- 
quirement more  than  50  percent  of  the  time 
may  be  seen  as  no  better  than  one  that  never 
meets  the  requirement.  In  such  a case,  the 
scale  for  how  often  the  requirement  was  met 
should  match  the  program  staff's  judgments  of 
what  is  meaningful. 

Sometimes  it  is  not  possible  to  get  a co\mt 
of  an  attribute;  instead,  a judgment  along  a 
rating  scale  must  be  used,  l^ether  the  num- 
bers used  in  the  count  or  in  judgments  of  an 
attribute  can  be  considered  to  be  ordered 
along  a continuum  from  lowest  to  highest,  and 
whether  the  distances  between  the  numbers 
along  the  scale  are  equal,  represent  part  of 
what  must  be  specified  in  the  scale. 

Once  scores  have  been  calculated,  tests  of 
reliability  need  to  be  done.  Different  observ- 
ers should  obtain  similar  results  in  their 
measurements,  and  a given  observer  should 
obtain  the  same  results  from  meeisurement  of 
an  attribute  at  different  times,  if  no  changes 
have  occurred  in  that  attribute. 

Content  validity  requires  that  knowledge- 
able people  agree  that  following  the  specified 
procedures  will  yield  a measure  of  the  in- 
tended characteristic.  This  can  be  checked  by 
asking  persons  involved  with  the  program 


whether  the  performance  meeisure  appezu^  to 
capture  the  aispect  of  performance  it  was  de- 
signed to  meaisure.  Such  queries  should  be 
meide  before  the  mezisurement  is  made,  so  that 
judgments  are  not  influenced  by  whether  the 
results  of  the  mezisurement  make  the  program 
look  good  or  bad. 


Special  Measurement  Problems 

Some  special  problems  arise  in  many  meas- 
urements of  program  performance  because  of 
the  complex  nature  of  the  phenomena  to  be 
measured  and  the  fact  that  such  measures 
usually  must  be  compared  with  some  standard 
or  population  of  comparative  meaisures  to  be 
interpreted  meaningfully. 

Ratios  and  Differences 

Many  program  characteristics  involve  the 
relationship  between  two  zissociated  variables. 
For  example,  the  number  of  clients  served  per 
1,000  program  hours  would  be  expressed  as  a 
ratio.  The  level  of  client  functioning  at  the 
end  of  the  program  compared  with  the  level  of 
functioning  at  the  beginning  of  the  program 
would  be  expressed  as  a difference  score,  and 
possibly  a ratio  score  zis  well.  Variables  made 
up  of  other  variables  combine  the  errors  of 
mezisurement  within  component  parts  and  risk 
additional  error  from  the  computations  to  de- 
rive them.  These  sources  of  error  are  not  in- 
tuitively obvious  to  nonstatisticians  and 
therefore  warrant  special  care  in  how  they  zire 
used  (Long  1979;  Johns  1981). 

Many  of  the  problems  of  mezisurement  of 
organizational  performaince  are  similar  to 
those  encountered  in  the  development  of  so- 
cial indicators.  Developers  and  users  of  pro- 
gram performance  mezisures  can  benefit  from 
the  growing  literature  in  this  allied  field  (e.g., 
Rossi  and  Gilmartin  1980). 

Accuracy  of  Performance  Measures 

An  appraisal  of  the  degree  of  accuracy 
possible  at  present  in  measuring  organizational 
performance  is  available  in  Price's  Hcmdbodk 
of  Organizational  Measurement  (1972).  Price 
examined  the  literature  for  existing  measures 
of  organizations,  covering  28  concepts  such  as 
absenteeism,  centralization,  communication, 
coordination,  effectiveness,  innovation,  rou- 
tinization,  satisfaction,  size,  and  span  of  con- 
trol. Only  22  of  these  concepts  hzid  measures 
existing  in  the  literature.  The  author's  evalu- 


16 


ation  of  the  technology  in  this  area  wzis  that: 

The  level  of  organizational  measurement 
could  be  significantly  improved— to  state 
the  matter  charitably.  Mziny-  -perhaps 
most- -organizational  meeisures  are  beised 
on  a single  piece  of  information  rather 
than  on  combinations  of  different  pieces 
of  information.  In  short,  the  typical  or- 
ganizational meeisiire  is  an  indicator 
rather  than  an  index.  Or  again,  most 
measures  are  accompanied  by  no  infor- 
mation that  explicitly  discusses  their  va- 
lidity and  reliability.  Given  the  low  level 
of  organizational  meeisurement,  . . . [too 
rigorous  standardization  is  inappropriate]. 

(pp.  1-2) 

Almost  all  of  the  measures  in  this  hand- 
book are  nominal  and  ordinal  mezisures 
. . . interval  and  ratio  measures  have  not 
generally  been  developed  for  the  concepts 
used  to  study  organizations,  (p.  3) 

A similarly  sobering  assessment  comes  from 
the  review  by  Morrissey,  Hall,  and  Lindsey 
(1982)  of  existing  measures  of  interorgani- 
zational  relations  for  mental  health  programs. 

Much  of  the  work  published  to  date  has 
relied  on  an  extrapolation  of  concepts 
from  interorganizational  research  (Mar- 
rett  1971)  and  the  use  of  "face  valid”  or 
judgmental  criteria  in  selecting  ques- 
tionnaire items  or  other  indicants  of  these 
concepts.  Tn  addition,  many  of  the  mezis- 
ures  are  bzised  on  single  rather  than  mul- 
tiple items.  Of  the  35  meeisures  presented 
in  this  sourcebook,  17  (49  percent)  are 
beised  on  single-item  indicants— a practice 
which  precludes  the  derivation  of 
empirical  estimates  of  the  reliability  and 
validity  of  the  observations  from  the  data 
generated  in  a single  study  (Blalock  1970). 
Even  in  those  situations  in  which  mul- 
tiple-item indices  are  employed,  it  is  rare 
for  investigators  to  report  information  on 
their  measurement  properties.  In  addition, 
many  widely  used  meeisures  tap  only  part 
of  the  referent  concept. 

Relationships  Between  Measurement 
and  Management 

Several  types  of  relationships  can  be  seen 
between  measurement  and  management.  The 
most  obvious  is  that  management  can  control 


an  organization  to  an  increased  degree  if  it 
has  increzisingly  specific  information  (Thomp- 
son 1967). 

A number  of  investigators  (e.g.,  Thompson 
1967;  Gam  et  al.  1976;  Ouchi  1980)  have  ar- 
gued that  relationships  exist  between  the  level 
of  development  of  a technology  and  the  type 
of  management  stmcture  or  strategy  most 
appropriate  for  agencies  applying  the  tech- 
nology. Litwak  and  zissociates  (1970)  identified 
certain  types  of  t2isks  that  can  be  carried  out 
better  by  primary  groups  than  by  experts  in 
bureaucratic  institutions.  Tasks  that  primary 
groups  do  better  are  those  requiring  no  real 
knowledge,  those  so  simple  that  anyone  can  do 
them,  or  those  that  are  unpredictable  or  re- 
quire great  speed. 

We  usually  eissume  that  oversight  will  pro- 
duce more  responsive,  compliant  performance 
in  the  overseen  agency.  This  eissumption  em- 
braces several  additional  assumptions,  one  of 
which  is  that  the  values  the  oversight  agency 
or  administrator  wishes  to  enforce  are  more  in 
the  general  public  interest  or  the  program's 
interest  than  are  those  the  overseen  agency 
would  otherwise  follow.  This  assumption  is 
likely  to  be  valid  to  the  extent  that  the  pro- 
gram structure  permits  program  personnel  to 
satisfy  their  personal  wishes  by  distorting  the 
program  from  its  alleged  goals  (e.g.,  by  wast- 
ing time,  using  program  resources  for  personal 
benefit,  or  using  program  power  for  personal 
plezisure). 

Another  assumption  is  that  the  account- 
ability structure  will  not  lead  to  defensive 
behavior  incompatible  with  the  primary  mis- 
sion. Some  investigators  have  reported  con- 
ditions in  which  this  assumption  did  not  holdr— 
where  accountable  behavior  was  found  less 
desirable  than  that  with  less  accountability. 

Adelberg  and  Batson  (1978)  found  experi- 
mental evidence  that  when  clients'  needs  ex- 
ceed available  resources,  making  a helping 
agent  accountable  to  either  the  provider  or 
recipients  impairs  effective  use  of  resources. 
Cvetkovich  (1978)  found  that  decisionmakers 
who  expect  to  be  personally  accountable  for 
decisions  shifted  to  a form  of  thinking  that  is 
analytic  zmd  ezisily  described  to  others, 
whereas  those  deciding  only  for  themselves 
used  more  intuitive,  difficult-to-describe 
processes. 

Effects  also  may  follow  from  the  process  of 
meeisiirement.  One  negative  effect  is  the 
channeling  of  resources  and  effort  from  the 
primary  program  functions  into  measurement 
and  reporting.  Another  may  be  the  reshaping 
of  primary  functions  by  their  conjunction  with 


17 


the  data-gathering  process,  which  may  make 
them  more  systematic  on  the  one  hand 
(missing  fewer  cases)  or  less  responsive  to 
idiosyncratic  zispects  of  particular  Ctises  on 
the  other. 


Issues  in  Measurement 

The  papers  presented  here  on  the  state  of 
the  art  of  measurement  deal  with  underl3dng 
prerequisites  and  contrzisting  approaches  to 
program  measurement.  Tn  most  programs, 
measurements  depend  upon  the  existence  of 
management  information  systems.  Attldsson 
and  Broskowski  describe  the  characteristics  of 
management  information  systems  necessary  to 
support  performance  mezisurement. 

One  of  the  major  choices  in  designing 
evaluations  of  programs  is  whether  to  focus  on 
service  system  processes  or  on  program  prod- 
uct outcomes.  Service  processes  are  often 
ezisier  to  mezisure  than  outcomes,  and  because 
they  focus  on  activities  that  programs  may  be 
able  to  influence,  studies  of  processes  are 
likely  to  suggest  actions  the  program  can  take 
to  make  improvements.  On  the  other  hand, 
there  is  much  commonsense  appeal  to  being 
able  to  point  to  the  ultimate  criterion,  the 
impacts  the  program  wishes  to  achieve.  Scott 
et  al.  (1978)  proposed  that  organizational 
managers  might  prefer  structural  mezisures  of 
organizational  characteristics,  because  they 
have  control  over  such  factors;  the  rank  and 
file  might  prefer  process  measures  of  activ- 
ities, because  they  control  their  own  per- 
formance; and  clients  and  customers  prefer 
outcome  mezisures,  because  they  want 
results— not  promises  or  effort. 

McAuliffe  hzis  examined  the  arguments  used 
to  advocate  outcome  rather  than  process 
measurement,  with  particular  reference  to  the 
health  care  field.  His  analysis  "reveals  that 
outcome  meeisures  are  not  clearly  superior; 
they  are  less  direct  than  process  measures, 
they  have  major  practical  problems,  and  their 
validity  has  rarely  been  tested  empirically."  It 
is  likely  that  many  of  McAuliffe’s  observations 
would  apply  equally  well  to  domains  other  than 
health,  but  evidence  about  how  broadly  his 
thesis  can  be  extended  remains  to  be  collected. 

The  terms  process  and  outcome  apply  pri- 
marily to  research  on  the  provision  of  serv- 
ices. Performance  mezisurement  is  more 
closely  related  to  the  interest  of  management 
in  monitoring,  evaluating,  or  controlling  a 
program.  Further,  the  focus  on  performance 
implies  a businesslike  orientation.  Using 


business  ais  a model  implies  that  there  are 
relatively  standard  products  and  that  the 
measurement  can  consist  of  counting  products 
amd  dividing  by  costs.  As  the  public  sector  is 
attacked  for  being  insufficiently  productive 
and  excessively  costly,  public  service  agencies 
and  bureaucrats  find  it  useful  to  adopt  the 
terminology  and  ways  of  thinking  of  the  for- 
profit  sector.  Hatry  provides  a review  of  the 
current  status  of  productivity  measurement  in 
the  public  sector. 


References 

Adelberg,  S.,  and  Batson,  C.D.  Accountability 
and  helping;  When  needs  exceed  resources. 
Journal  of  Personality  and  Social  Psychology 
36:343-350,  1978. 

Blalock,  H.J.,  Jr.,  ed.  Measurement  in  the 
Social  Services.  Chicago:  Aldine,  1970. 

Cvetkovich,  G.  Cognitive  accommodation, 
language,  and  social  responsibility.  Social 
Psychology  Al:U9-l55,  1978. 

Gam,  H.A.;  Flax,  M.J.;  Springer,  M.;  and  Tay- 
lor, J.B.  Models  for  Indicator  Development: 
A Framework  for  Policy  Analysis.  Wash- 
ington, D.C.:  The  Urban  Institute,  1976. 

Johns,  G.  Difference  score  meaisures  of  orga- 
nizational behavior  variables:  A critique. 
Organizational  Behavior  and  Human 
Performance  27:443-463,  1981. 

Litwak,  E.;  Shiroi,  E.;  Zimmerman,  L.;  and 
Bernstein,  J.  Community  participation  in 
bureaucratic  organizations.  Principles  and 
strategies.  Interchange  1:44-60,  1970. 

Long,  S.B.  The  continuing  debate  over  the  use 
of  ratio  variables.  Facts  and  fiction.  In; 
Schuessler,  K.F.,  ed.  Sociological  Method- 
ology. San  Francisco:  Jossey-Bziss,  1979.  pp. 
37-67. 

Marrett,  C.B.  On  the  specification  of  inter- 
organizational  dimensions.  Sociology  and 
Social  Research  56(10):83-99,  1971. 

Morrissey,  J.P.;  Hall,  R.H.;  and  Lindsey,  M.L. 
Interorganizational  Relations:  A Sourcebook 
of  Measures  for  Mental  Health  Programs. 
National  Institute  of  Mental  Health,  Series 
BN  No.  2.  DHHS  Pub.  No.  (ADM)82-1187. 
Washington,  D.C.:  Supt.  of  Docs.,  U.S.  Govt. 
Print.  Off.,  1982. 

Ouchi,  W.G.  A framework  for  understanding 
organizational  failure.  In;  Kimberly,  J.R.; 
Miles,  R.H.;  and  associates,  eds.  The  Orga- 
nizational Life  Cycle.  Washington:  Jossey- 
Bziss,  1980.  pp.  395-430. 

Price,  J.L.  Handbook  of  Organizational  Meas- 
urement. Lexington,  Mziss.;  Heath,  1972. 


18 


Rossi.  R.J.,  zind  Gilmartin,  K.J.  The  Handbook 
of  Social  Indicators:  Sources,  Charac- 

teristics and  Analysis.  New  York:  Garland 
STPM  Press,  1980. 

Scott,  W.R.;  Flood,  A.B.;  Ewy,  W.;  and 
Forrest,  W.H.,  Jr.  Organizational  effec- 
tiveness: Studying  the  quality  of  surgical 
care  in  hospitals.  In:  Meyer,  M.H.,  ed.  En- 
vironments and  Organizations.  San  Fran- 
cisco: Jossey-Bass,  1978.  pp.  351-368.  Cited 
in:  Kanter,  R.M.,  and  Brinkerhoff,  D.  Or- 


ganizational performance:  Recent  devel- 
opments in  mezisurement.  Annual  Review  of 
Sociology  7:321-349,  1981. 

Thompson,  J.D.  Organizations  in  Action.  New 
York:  McGraw-Hill.  1967. 

Warner,  D.M.,  and  Holloway,  D.C.  Decision- 
making and  Control  for  Health  Adminis- 
tration: The  Management  of  Quantitative 
Analysis.  Ann  Arbor,  Mich.:  Health  Ad- 
ministration Press,  1978. 


19 


Human  Service  Information  Systems 


C.  Clifford  Attkisson,  Ph.D.  Anthony  Broskowsld.  Ph.D. 

Department  of  Psychiatry  Northside  Community  Ment2d  Health  Center 

University  of  Czdifomia,  San  Francisco  Tampa,  Florida 


Information:  Any  difference  v/hich  makes  a difference. 

— Gregory  Bateson 


Information  can  be  characterized  by  many 
attributes— reliability,  precision,  validity, 
timeliness,  eind  costliness.  These  attributes  are 
interdependent:  Changes  in  any  one  will  entail 
changes  in  the  others.  Ultimately,  however, 
the  question  arises:  Does  the  information 
being  considered  have  any  value?  The  re- 
curring compulsion  to  design  and  implement 
Izirge-scale  information  systems,  spanning 
geographical,  organizational,  and  jurisdictional 
boundaries,  must  be  viewed  as  a phenomenon 
begging  this  bottom-line  question— will  the 
information  make  any  difference?  Histori- 
cally, the  evolution  of  national  data  systems 
has  paralleled  the  growth  of  Government  and 
big  business,  limited  only  by  the  technological 
capacities  to  collect,  store,  retrieve,  and 
process  data  from  multiple  sites  and  time 
periods.  The  latest  expression  of  that  evolu- 
tionary process  is  the  recent  effort  at  levels 
of  State  and  Federal  Government  to  imple- 
ment performance  zissessment  systems  (PAS). 

Rationale  for  Performance 
Assessment  Systems 

Performance  Assessment  Systems  zissume 
that  key  indicators  can  reflect  the  perform- 
ance of  a complex  operating  system.  This  as- 
sumption demands  that  PAS  designers  under- 
stand the  underlying  principles  that  govern  the 
operation  of  the  system.  The  greater  the  com- 
plexities of  the  system  or  the  poorer  its  users' 
understanding  of  it,  the  more  difficult  is  the 
task  of  identifying  valid  and  sensitive  indi- 
cators. Principles  that  govern  the  system's 
reactions  to  environmental  variations  also 
must  be  well  understood  if  the  designers  wish 
to  establish  standards  for  indicators. 


Additionally,  a minimum  set  of  indicators 
must  be  identified  to  make  the  PAS  process 
efficient.  The  loss  of  comprehensive  feedback 
is  acceptable  to  the  user  of  the  PAS,  since  full 
knowledge  may  be  too  costly  or  unnecessary. 
Since  indicators  only  reflect  performaince,  the 
choice  of  indicators  also  must  be  mzuie  with 
the  utmost  concern  for  their  sensitivity  and 
validity.  All  reflections  contain  distortions. 
Some  loss  of  detail  already  accrues  from  the 
simplification  process  and  from  the  restriction 
to  a minimum  set.  The  minimum  indicators, 
therefore,  must  be  reliable,  sensitive,  and 
valid  within  a range  that  is  appropriate  to  the 
decisions  to  be  made  on  the  basis  of  the 
feedback  they  provide. 

Finally,  performance  indicators  generally 
must  be  produced  and  delivered  to  the  user 
rapidly,  soon  after  the  monitored  performance 
hzis  occurred.  Here  too,  tolerance  for  delay 
between  performance  and  feedback  will  de- 
pend on  the  purposes  of  the  user.  All  criteria 
for  a PAS  typically  are  relative  to  the  uses  for 
which  the  PAS  is  designed.  The  acceptable 
range  of  variation,  or  the  standard,  for  any 
criterion  must  be  judged  on  the  basis  of  the 
costs  2ind  risks  involved  in  the  actions  or  de- 
cisions bzised  on  the  PAS. 

A performance  zissessment  system  was 
conceptualized  by  Weirich  (1981)  as  having 
three  components: 

1.  Performzince  indicators  (PI),  the  set  of 
measures  or  indices  used  to  reflect 
performance 

2.  A performance  management  information 
system  (PMIS),  the  formal  system  of 
persons,  equipment,  and  procedures  to 
collect,  edit,  process,  store,  and  retrieve 


20 


the  performance  indicators 

3.  A process  of  management  decision- 
maldmg  (PMGT),  the  collective  users  of 
the  PMIS  who  interpret  the  indicators, 
integrating  their  meaning  with  other 
considerations,  and  make  decisions  or 
take  action  relative  to  the  system  being 
measured 

Thus,  PAS  = PI  + PMIS  + PMGT 

This  chapter  is  concerned  primauily  with  the 
second  PAS  component,  the  PMIS,  and  how 
characteristics  of  the  formal  data  collection 
systems,  including  people,  paper,  and  ma- 
chines, can  influence  the  overall  performance 
assessment  system.  Obviously,  it  is  necessary 
to  keep  in  mind  that  the  characteristics  of  the 
PMIS  will  interact  with  features  of  the  man- 
agement structure  and  processes  (PMGT). 
Consequently,  we  must  address  the  nature  of 
the  objects  or  events  being  encoded  for  PMIS 
input,  as  well  as  the  factors  that  influence  the 
interpretation  and  use  of  the  PMIS's  output. 
Of  the  three  components  (PI,  PMIS,  and 
PMGT),  the  technical  operations  of  the  PMIS 
are  probably  the  best  developed  and  best  un- 
derstood. The  most  formidable  barriers  to  a 
performance  assessment  system  appear  to  lie 
within  the  first  and  third  components. 


The  Anatomy  of  an  Information  System 

In  a formally  designed  and  operated  infor- 
mation system,  information  is  bom  of  data 
elements  that  have  been  collected,  processed, 
and  stored  according  to  explicit  rules  and 
procedures.  Data  elements  become  informa- 
tion by  the  way  they  are  treated  and  com- 
bined. These  technical  procedures  determine 
whether  the  data  elements  can  be  transformed 
into  information  that  is  reliable,  valid,  and 
timely. 

Data  elements  are  the  b2isic  symbols  to  be 
used  to  record  characteristics  of  the  world 
that  we  want  to  represent  in  our  information 
system.  Obviously  the  selection  and  definition 
of  data  elements  is  critical,  both  as  a limit  on 
what  the  system  can  do  and  what  it  will  cost. 
Data  collection  does  not  follow  automatically; 
rather,  it  is  often  a costly  activity,  and  may 
redefine  data  elements  or  be  a source  of  error. 
Once  collected,  data  elements  must  be  proc- 
essed to  ensure  accuracy  and  completeness. 
How  carefully  data  collection  and  processing 
are  done  should  depend  upon  the  importance  of 


accuracy.  The  method  of  data  storage  is 
critical  for  future  versatility  of  data  retrieval, 
that,  in  turn,  depends  upon  the  planned  anal- 
yses. Anal3^is  and  application  are  tasks  to 
combine  retrieved  data  elements  into  patterns 
of  "useful  information." 

This  stage  of  transforming  data  elements 
into  information  involves  data  reduction  (to 
select  relevant  data)  and  time  compression 
(adding  data  over  time).  However,  these  pro- 
cesses all  take  time  to  perform,  leading  to 
delay  in  feedback. 


Factors  Working  Against  Assessment 

A critical  consideration  in  any  information 
or  communication  system  is  the  nature  of  the 
relationships  existing  in  the  network  of  send- 
ers and  receivers.  For  example,  permanent 
networks  need  different  stand^ds  than  tran- 
sitory ones  do.  Networks  characterized  by 
equity  and  equality  of  power  will  exchange 
information  differently  than  those  where 
power  differentials  are  great.  Therefore,  when 
it  comes  to  national  or  State  systems  to  assess 
the  performance  of  Government  funded  agen- 
cies, we  should  not  delude  ourselves  with  what 
is  at  stake. 

Information  is  a form  of  power  or  a prime 
tool  in  the  exercise  of  power.  In  effect,  then, 
the  transmission  of  information  can  be  a 
transmission  of  organizational  power.  In  a 
competitive  and  highly  adversarial  world,  the 
communication  of  information  can  be  risky. 
Most  organizations  resist  giving  away  power. 

Human  service  information  S3^tems  gen- 
erally have  been  constructed  to  assemble  in- 
formation for  use  by  regulating  agencies  and 
individuals  outside  the  boundaries  of  the  or- 
gzmization  that  implements  the  information 
s5Tstem.  Accurate  and  timely  information 
transmitted  to  the  external  agency  thus  em- 
powers that  external  agency— potentially  to 
the  detriment  of  the  communicating  organi- 
zation. Consider,  for  example,  the  operations 
management  S3rstem  (QMS),  initiated  in  1980 
by  the  U.S.  Department  of  Health  and  Human 
Services  and  used  by  the  National  Institute  of 
Mental  Health  to  manage  community  mental 
health  centers  (CMHCs).  The  QMS  specified  13 
performance  measures,  using  statistics  that 
CMHCs  were  required  to  submit  to  the  Gov- 
ernment (NIMH  1981).  The  initial  concept 
called  for  review  of  each  center's  standing  on 
these  measures  as  a factor  in  determining 
future  funding.  This  Federal  policy  tied  stat- 
istical monitoring  to  funding.  The  information 


21 


communicated  to  the  Government  enabled  the 
Government  to  increase  its  monitoring  ca- 
pacity—an  ability  likely  to  reduce  the  freedom 
of  the  monitored  agencies.  This  reality  helps 
us  understand  why  information  systems  have 
not  been  implemented  on  a widesprezid  and 
effective  basis  in  the  public  sector. 

Other  observers  have  tried  tc  explain  the 
slow  development  of  management  information 
systems  by  pointing  to  technological  limita- 
tions, costs,  threats  to  client  confidentiality, 
and  lack  of  adequately  trained  technicians  and 
managers.  In  our  view,  these  importzint  factors 
pale  in  contrzist  to  the  interorganizational 
struggles  that  often  result  in  wary  self- 
protection, failures  to  communicate,  and 
transmissions  of  outright  propaganda. 

Some  organizations  fail  to  implement  in- 
formation systems  for  defensive  reasons  re- 
lated to  organizational  anxieties  triggered  by 
their  autonomy/dependency  struggles  with 
Government  or  other  oversight  agencies,  in- 
cluding the  media.  Adding  further  complexity 
to  this  herculean  effort  are  the  consumer  in- 
terest groups,  special  interest  lobbies,  and 
professional  societies  that  have  emerged  as 
allied  or  conflicting  domains.  In  this  ongoing 
struggle,  most  service  agencies  have  steadily 
maintained  a head-in-the-sand  strategy,  hop- 
ing quixotically  for  the  continuation  of  cate- 
gorical grant-bzised  funding  without  having  to 
produce  evidence  of  actual  needs  for  service 
or  of  the  productivity,  quality,  or  cost  of 
services. 

Information  systems  that  survive  and  ma- 
ture are  able  to  meet  both  internal  and  ex- 
ternal demands  for  information.  Systems 
initially  constructed  to  support  internal  infor- 
mation requirements  related  to  measurement 
of  effort,  productivity,  cost  eissessment,  and 
billing  operations  seem  especially  able  to 
flourish.  Such  systems  are  typically  more  than 
able  to  meet  information  demands  from  the 
external  environment.  In  contreist,  information 
sjrstems  constructed  to  respond  almost  ex- 
clusively to  external  bureaucracies  tend  to 
crumble  with  changing  external  circum- 
stances. Importzint  and  legitimate  as  these 
external  requirements  may  be,  they  do  not 
constitute  the  rationale  essential  to  erecting 
information  s3rstems  that  work  and  evolve  at 
the  service  delivery  level  (Broskowski  and 
Attkisson  1982). 

In  the  short  range,  externally  driven  infor- 
mation systems  may  be  lasef^ul  for  budget 
maintenance  skirmishes  at  higher  levels,  but 
inevitably  these  short-range  strategies  are 
inflexible  zmd  have  sparse  value  for  program 


performance  assessment  beyond  documenta- 
tion of  utilization  rates.  Robust  information 
system  capacity  at  the  service  delivery  level 
can  circumvent  the  mainy  problems  of  the  ex- 
ternally driven  system,  as  the  greatest  need 
for  detailed  information  exists  at  the  local 
level  (Attkisson  and  Nguyen  1981).  External 
information  needs  can  be  viewed  as  byproducts 
of  more  detailed  internal  information  needs 
and  can  be  achieved  by  integrating  data  ele- 
ments for  external  communications.  Thus, 
performzmce  zissessment  systems,  as  the  next 
developmental  manifestation  of  externally 
driven  data  reporting  systems,  should  be  de- 
rivative systems,  built  upon  the  structure  and 
capacity  of  internally  driven  information 
systems. 

Applications  of  Performance  Assessment 

The  feasibility  of  any  Izirge-scale  perform- 
ance assessment  S3rstem  depends  upon  its 
intended  purposes.  What  are  some  of  the  pos- 
sible applications  of  a PAS  by  State  and  Fed- 
eral Governments? 

• To  develop  tentative  norms  reflecting 
local  agency  performance  for  use  in 
agency  self-evaluation. 

• To  develop  tentative  norms  of  agency 
performance  that  Government  can  use  to 
monitor  Government-funded  activities 
and  use  as  a bztsis  for  improving  Govern- 
ment policy  and  program. 

• To  monitor  the  past  performance  of  local 
agencies  to  determine  if  performance  was 
within  acceptable  ranges,  given  the  local 
circumstances  surrounding  each  agency. 
This  feedback  could  lead  to  the  Govern- 
ment's taking  corrective  meaisures  in 
Government  policy,  practice,  or  fund- 
ing—or  to  action  to  improve  local  agency 
performance  through  technical  assistance, 
support,  or  incentive. 

• To  assess  the  performance  of  a State  or 
Federal  agency  under  the  zissumption  that 
the  performance  of  a collection  of  local 
agencies  is  a surrogate  measure  of  over- 
sight agency  performance.  For  example, 
the  collective  progress  of  CMHCs 
throughout  a State  on  deinstitution- 
alization programs  may  reflect  the  State 
department  of  mental  health's  ability  to 
provide  guidance,  technical  assistance, 
and  leadership. 


22 


• To  extend  the  control  of  the  central 
funding  and  regulatory  agency  over  the 
actions  of  local  agencies. 

• To  inform  potential  consumers  (indi- 
viduals and  organizations)  about  the  per- 
formance of  agencies  when  there  are  op- 
tions from  which  to  choose. 

• If  the  purpose  of  the  zissessment  S5rstem  is 
to  help  agencies  improve  their  own  per- 
formance by  providing  environmentally 
referenced  performance  ncrms,  the  fea- 
sibility is  moderate.  If  the  S5rstem  is  to  be 
used  primarily  as  a basis  for  resource  al- 
locations, local  agency  resistance  will 
reduce  the  feeisibility  to  minimal  levels. 
In  brief,  although  the  technical  capabil- 
ities exist  to  collect,  process,  and  trans- 
mit key  indicators,  the  conceptual  and 
financial  barriers  to  doing  so  on  any  large 
scale  are  indeed  formidable. 


Problems  Affecting  PAS  Viability 

Complex  processes  govern  the  operation  and 
effectiveness  of  human  service  agencies. 
Further,  the  influence  of  environmental  fac- 
tors on  the  behavior  of  local  orgeuiizations  is 
so  strong  that  key  indicators  are  not  likely  to 
provide  information  of  sufficient  validity  to 
guide  policsnnaking  or  resource  allocation  de- 
cisions at  higher  levels.  We  must  either  de- 
velop environmentally  specific  norms  for  each 
chosen  indicator  or  better  understand  how 
each  indicator  is  influenced  by  environmental 
factors. 

The  costs  of  designing  and  operating  in- 
formation systems  that  collect  data  reliably 
and  that  transmit  data  in  a timely  fashion  are 
prohibitive.  The  technology  exists,  but  it  is 
expensive. 

A national  or  multisite  performance  zis- 
sessment  sjrstem  depends  upon  the  quality  of 
the  information  system  within  each  site.  If 
information  quality  varies  greatly  across  sites, 
the  total  information  network  is  threatened. 
The  least  common  denominator  in  a multisite 
performance  assessment  system  is  the  infor- 
mation quality  and  availability  of  the  lezist 
2ulequate  site  in  the  information  network.  One 
solution  is  to  have  the  information  quality  and 
retrievability  graded  for  each  site  and  have 
aggregate  analyses  be  limited  to  sites  with 
acceptable  data  quality.  Such  limitations, 
however,  attenuate  the  generalizability  and 
applicability  of  the  results.  Facilities  with  a 


well-functioning  information  system,  however, 
may  also  be  superior  organizations  in  other 
ways.  If  this  is  true,  aggregate  results  that 
eliminate  facilities  with  inzidequate  informa- 
tion s3Tstems  will  be  biased  and  will  reflect 
only  a limited  sample  of  service  organizations. 

Valid  information  on  interconnected  systems 
costs  money.  Highly  interconnected  distrib- 
uted service  systems,  especially  in  the  com- 
mercial sector  (e.g.,  franchises  and  manufac- 
turer outlets),  invest  an  important  fraction  of 
available  resources  for  information  services 
(Davis  1974;  Martin  1977).  These  investments 
pay  off,  because  accurate  information  rein- 
forces and  maintains  systemic  interconnec- 
tions. The  investments  also  support  primary 
decisions  about  resource  allocation,  staff 
productivity,  and  organizational  productivity. 

Multisite  health  and  human  service  organi- 
zations rarely  have  this  degree  of  intercon- 
nection. Consequently,  comparable  efforts  to 
develop  adequate  information  sj^tems  have 
lagged  far  behind  the  private  sector.  Without 
requisite  organizational  incentives  and  con- 
trols, it  is  not  possible  to  achieve  the  neces- 
sary technological  standardization.  In  euidition, 
a critical  meiss  of  resources,  technology,  and 
expertise  is  required  in  every  setting  to  design 
and  implement  the  essential  personnel  and 
equipment  configuration. 


Recommendations 

Our  discussion  of  performance  assessment 
systems  in  this  chapter  leads  to  several  rec- 
ommendations to  the  field: 

• Local  service  agencies  should  be  encour- 
aged to  develop  PAS  for  internal  uses, 
supported  by  Government  funds  and 
technical  zissistance.  This  approach  will 
help  to  identify  performance  indicators 
that  are  most  resisonable  in  terms  of 
sensitivity,  validity,  efficiency,  and 
timeliness. 

• Continued  efforts  to  develop  large-scale 
PAS  should  be  characterized  by  collab- 
orative partnerships  among  local.  State, 
and  Federal  agencies.  The  development  of 
the  Mental  Health  Statistics  Improvement 
Program,  a cooperative  venture  of  Fed- 
eral, State,  and  local  service  agencies 
(NIMH  1983),  is  an  example  of  the  process 
to  be  followed. 

• Funding  zmd  regulatory  agencies  at  higher 


23 


levels  should  provide  expanded  direction 
to  operating  agencies  at  lower  levels 
about  information  systems  needed  to 
sustain  and  improve  the  service  delivery 
system.  Capacity  to  respond  to  those  de- 
mands must  be  developed  from  the 
bottom  up. 

• Focused  research  should  be  conducted  on 
specific  performance  indicators  to  de- 
termine how  they  interact  with  other 
factors,  such  as  organizational  size, 
structure,  and  environmental  variations. 

• Special  collaborative  studies  and  panel- 
survey  designs  can  reduce  overall  PAS 
costs.  For  example,  48  mental  health 
centers  chosen  to  represent  a larger  uni- 
verse are  peirticipating  in  a survey  that 
will  provide  client-specific  information 
about  the  match  of  services  to  clients 
(Rosen  et  al.  1981).  This  approach  focuses 
effort  zind  limits  the  burden  of  supplying 
data  while  providing  more  information 
about  a national  program. 

• The  current  rhetoric  to  reduce  Govern- 
ment regulation  of  service  delivery  should 
be  translated  into  reality.  Given  suffi- 
cient flexibility  at  the  local  level,  main- 
agers  and  agency  policymakers  will  be 
more  inclined  to  begin  monitoring  their 
performance  if  they  have  the  discretion 
to  make  necessary  changes. 

• Federal  and  State  government  agencies 
can  enhzince  service  agency  information 
capability  through  technical  assistance 
and  grant  programs.  Specifically,  two 
contributions  by  Government  could  be 
very  important.  Actual  distributed  pro- 
cessing S3T5tems  (Davis  1974;  Sorensen  and 
Elpers  1978),  with  eidequate  client  and 
agency  safeguards,  can  be  supported 
throu^  greints  and  contracts,  while  ef- 
forts to  develop  and  disseminate  proto- 


type information  systems  should  continue 
with  increased  priority  and  improved 
coordination. 

References 

Attkisson,  C.C.,  and  Nguyen,  T.D.  Evaluative 
research  and  health  policy:  Utility,  issues, 
and  trends.  Health.  Policy  Quarterly  1:22-42, 
1981. 

Broskowski,  A.,  and  Attkisson,  C.C.  Informa- 
tion Systems  for  Health  and  Human  Service 
Programs.  Unpublished  mamuscript,  1982. 

Davis,  G.B.  Management  Information  Systems: 
Conceptual  Foundations,  Structure,  and  De- 
velopment. New  York:  McGraw-Hill,  1974. 

Martin,  J.  Computer  Data  Base  Organization. 
2d  ed.  Englewood  Cliffs,  N.J.:  Prentice- 
Hall,  1977. 

National  Institute  of  Mental  Health.  A Guide- 
book to  the  1981  Operations  Management 
System  for  Federally  Funded  Community 
Mental  Health  Centers.  Rockville,  Md.:  the 
Institute,  Division  of  Biometry  and  Epide- 
miology, 1981. 

National  Institute  of  Mental  Health.  Series  FN 
No.  8,  The  Design  and  Content  of  a National 
Mental  Health  Statistics  System,  by  Patton, 
R.E.,  and  Leginski,  W.A.  DHHS  Pub.  No. 
(ADM)83-1095.  Rockville,  Md.:  the  Institute, 
1983. 

Rosen,  B.;  Windle,  C.;  and  Walter,  R.  ”The 
CMHC  Panel  Survey:  At  Last,  a Client- 
Based  Survey  for  CMHCs.”  Presentation  at 
the  National  Council  of  CMHCs  meeting, 
Dallas,  Texzis,  April  1981. 

Sorensen,  J.E.,  and  Elpers,  J.R.  Integrated 
management  information  systems.In:  Att- 
kisson C.C.;  Hargreaves,  W.A.;  Horowitz, 
M.J.;  and  Sorensen,  J.E.,  eds.  Evaluation  of 
Human  Service  Programs.  New  York:  Aca- 
demic Press,  1978. 

Weirich,  T.W.  "Performance  Assessment  in 
Community  Mental  Health:  Evaluators  Help 
Their  Clients  Leap  Tall  Buildings  in  a Single 
Bound."  Unpublished  manuscript,  1981. 


24 


Measuring  the  Quality  of  Medical  Care 
Process  Versus  Outcome* 

William  E.  McAuliffe,  Ph.D. 

Depeirtment  of  Behavioral  Sciences 
Heuvard  University  School  of  Public  Health 


This  article  examines  relevant  empirical 
evidence  and  the  logic  of  major  zirguments 
relating  to  process  versus  outcome  measure- 
ment. The  Jirguments  include  assertions  con- 
cerning practical  data  problems,  impacts  on 
medicine  and  the  public  interest,  and  mezis- 
urement  validity.  Analysis  reveals  that  out- 
come mezisures  are  not  clearly  superior:  they 
are  less  direct  than  process  measures,  they 
have  major  practical  problems,  and  their  va- 
lidity has  rarely  been  tested  empirically.  Al- 
though process  meeisures  have  been  studied 
more  often  than  outcome  me2isui’es,  the  extent 
of  the  validity  and  effectiveness  of  process 
sissessments  is  also  virtually  unknown  because 
the  research  methods  used  up  to  now  have 
been  in2idequate.  Thus,  there  is  little  reason 
for  favoring  outcome  assessments  over  process. 


Outcome  Measures  of  Quality  Care 

The  main  argument  for  outcome  measure- 
ment is  simply  that,  since  the  goal  of  care  is 
health,  one  should  concentrate  on  measuring 
the  achievement  of  health  (Brook  1974,  p.  29; 
Thompson  and  Osborne  1974,  p.  808;  Palmer 
1976,  p.  33).  Thus,  McClure  (1973,  p.  334)  and 
Brook  et  zil.  (1977)  have  explained  that  an 
outcome  approach  would  be  more  direct  and 
skirt  squabbles  over  whose  process  is  most 
effective  by  letting  the  results  speak  for 
themselves.  Inspecting  outcomes  would  also 

•This  chapter  was  adapted  from  an  article  by  W.E. 
McAuliffe  in  the  MUbark  Memorial  Fund  Quar- 
terly/Health  and  Society  57:118-152,  Winter  1979. 
Copyright  1979,  by  the  Milbank  MemoriaJ.  Fund.  His 
research  was  supported  by  the  Executive  Programs 
in  Health  Policy  and  Meuiagement,  Contract  NOl- 
AH-44105  with  the  Bureau  of  Health  Msuipower, 
Health  Resources  Administration,  U.S.  Depzu^ment 
of  Health.  Education,  and  Welfeire. 


ensure  attention  to  the  cost-effectiveness  of 
care,  which  is  a central  concern  of  decision- 
makers. Many  authors  have  sisserted  that  an 
outcome  approach  would  therefore  be  superior 
(Osborne  zind  Thompson  1975,  p.  627; 
Schroeder  and  Donaldson  1976). 

Also,  few  questions  have  even  been  raised 
concerning  the  measurement  validity  of 
eissessing  objective  end  results  such  as  death, 
disezLse,  or  disability;  these  measures  have 
been  accepted  on  face  value  (Donabedian 
1969,  p.  34).  Even  proponents  of  process 
assessment  have  often  conceded  the  validity 
of  outcome  measures  (Schroeder  and  Donald- 
son 1976,  p.  50)  and  have  granted  that  process 
and  structural  elements  are  ultimately  "val- 
idated” by  their  "correlation  with  outcomes” 
(Donabedian  1966,  p.  169;  DeGe5mdt  1970,  p. 
36).  Another  argument  for  outcome  mesisures 
(Donabedian  1969)  is  that  the  intangibles  of 
care  (e.g.,  a physician’s  judgment),  which  are 
seemingly  difficult  to  measure  directly  with 
process  techniques  based  on  the  medical  rec- 
ord, are  revealed  in  the  patient's  outcome. 

Conceptual  Arguments  for  Outcome 
Measurement 

Close  examination  reveals  major  flaws  in 
the  logic  of  the  jtrguments  for  the  superiority 
of  outcome  measures.  Although  the  ultimate 
goal  of  most  medical  care  and  quality-of-czu-e 
regulation  is  improved  health,  it  does  not 
follow  that  the  quality  of  medical  care  in  any 
pzu’ticular  ceise  czm  be  defined  by  whether 
health  was  attained.  The  best  attempts  can 
fail,  sometimes  even  in  a majority  of  cases, 
whereeis  at  other  times  patients  routinely  re- 
cover in  spite  of  substandard  treatment.  No 
regulatory  body  can  insist  that  patient  out- 
comes be  positive,  nor  do  positive  outcomes 


25 


ensure  that  care  was  appropriate  or  skillful. 

The  gozd  of  quality  assessment  is  not  to 
produce  health,  at  least  not  directly;  it  is  to 
determine  whether  acceptable  care  was 
rendered.  Presumably,  if  proper  care  is  given, 
the  best  achievable  outcome  under  the  cir- 
cumstances will  result.  The  direct  approach  to 
assessment  would  be  to  observe  care  (the 
process)  first  hand  (Donabedian  1978,  pp. 
856-857).  A less  direct  method  of  assessment 
would  be  to  observe  whether  the  patient  had  a 
good  outcome  as  the  result  of  the  process. 
Unfortunately,  it  is  often  uncleair  whether  the 
outcome  was  primarily  a result  of  the  process. 

One  reviewer  has  objected  that  to  judge  the 
quality  of  care  by  direct  observation  of  proc- 
ess assumes  that  one  knows  which  process 
results  in  the  best  outcomes,  which  he  aisserts 
is  seldom  the  case.  The  necessary  experi- 
mental evidence  of  efficacy  is  absent  for  most 
medical  procedures.  Consequently,  he  claims 
that  one  must  examine  the  outcomes  to  de- 
termine whether  proper  care  was  rendered. 

Assessing  uncontrolled  outcomes,  however, 
is  no  solution  for  the  absence  of  experimental 
evidence  regarding  process  or  structure.  Ran- 
domized, controlled  experimentation  is  desired 
in  place  of  medical  opinion  for  determining 
the  efficacy  of  medical  procedures  precisely 
because  the  effects  of  factors  affecting  out- 
comes other  than  medical  care  (such  as  dis- 
eeise  severity)  must  be  eliminated  before  one 
can  safely  infer  that  outcome  variation  re- 
flects the  effects  of  czu*e.  The  same  extra- 
neoxis  factors  are  operative  (and  uncontrolled) 
in  medical  audit  studies.  If,  in  clmical  re- 
search, the  connection  between  process  (or 
structure)  and  outcome  is  too  ambiguous  to 
infer  causality  unless  experimental  controls 
are  employed,  then  what  epistemological  basis 
can  there  be  in  an  uncontrolled  audit  study  for 
inferring  that  an  undesirable  outcome  resulted 
from  injidequate  care  rather  than  from  other 
factors?  Thus,  examining  outcomes  directly 
does  not  offer  a way  around  the  constraints 
imposed  by  the  limits  of  medical  knowledge. 

In  theory,  the  outcome  variance  associated 
with  irrelevant  factors  could  be  eliminated 
statistically,  but  developing  and  testing  sat- 
isfactory statistical  models  would  probably  not 
be  much  easier  than  conducting  randomized 
trials.  Following  zire  discussed  the  many 
methods  that  have  been  proposed  for  refining 
outcome  measures.  Here,  it  is  enough  to  point 
out  that  constructing  a statistical  model  of 
outcomes  that  successfully  identifies  the 
variance  in  outcomes  sissociated  with  the  ef- 
fects of  ceire  is,  practically  speaking,  eqtiiva- 


lent  to  making  nonexperimental  causal  in- 
ferences between  the  process  amd  outcome  of 
care. 

If  quality  of  care  does  not  always  correlate 
with  patient  outcomes,  then,  one  might  ask, 
why  bother  assuring  "quality”?  The  answer 
requires  recognition  that,  even  when  a process 
of  care  is  efficacious,  patient  outcomes  may 
still  have  no  correlation  or  even  a negative 
correlation  with  process.  This  seeming  paradox 
is  explained  by  the  distinction  between  a cor- 
relation in  an  experiment  (indicating  causality) 
and  a correlation  in  a descriptive  audit. 
Existence  of  a causal  connection  between 
process  and  outcome  implies  a significant 
correlation  (although  the  correlation  need  not 
be  strong)  in  a properly  designed  randomized, 
controlled  trial;  in  an  uncontrolled  audit,  that 
correlation  can  be  completely  obscured  by 
other  factors.  Thus,  assuring  performance  of 
efficacious  care  is  desirable  even  if  outcome 
measures  lacking  needed  controls  do  not  cor- 
relate with  it  in  a medical  audit. 

In  sum,  the  conceptual  arguments  for  se- 
lecting outcome  mezisurement  over  process 
prove  to  be  rather  weak  when  examined 
closely.  Although  the  goal  of  medical  care  is 
health,  the  achievement  of  health  by  any 
particular  patient  in  uncontrolled  conations 
does  not  define  or  even  necessarily  indicate 
that  the  czire  received  was  acceptable.  End 
results  do  not  speak  for  themselves.  Outcome 
aissessment  is  not  an  zidequate  regulatory 
solution  when  medical  knowledge  is  inzide- 
quate.  On  the  basis  of  logic  alone,  care  itself 
(the  process)  should  be  the  prime  object  of 
quality-of-care  meeisurement,  but  other  con- 
siderations besides  logic  must  be  weighed  be- 
fore determining  which  type  of  measure  would 
be  best  in  any  given  situation. 

Practical  Obstacles  to  Outcome 
Assessment 

Proponents  of  outcome  measurement  admit 
that  its  full  zidoption  hinges  on  finding  solu- 
tions to  a number  of  practical  problems  (In- 
stitute of  Medicine  1974).  Large  samples  are 
needed  when  evaluating  rare  outcomes;  fol- 
lowup surveys  for  gathering  data  on  posthos- 
pitalization outcomes  can  be  expensive;  there 
may  be  a long  time  lag  between  treatment  and 
important  final  outcomes;  setting  standards 
for  outcome  mezisures  may  be  difficult 
(McAuliffe  1978a).  Schroeder  and  Donaldson 
(1976)  have  described  the  difficulties  one 
health  maintenance  organization  (HMO)  en- 
countered locating  patients  and  judging  out- 


26 


comes  when  implementing  an  outcome- 
oriented  quality  assessment.  What  has  not 
been  immediately  obvious  is  that  these 
"practical''  constraints  cause  invalidity  in 
outcome  assessments.  An  explanation  follows. 


The  Validity  of  Outcome  Measures 

Writers  on  quality  cissessment  generally 
agree  that  outcome  mezisures  have  validity,  a 
form  known  as  "face  validity,"  but  the  basis 
for  this  conclusion  can  be  questioned.  Face 
validity  simply  appeals  to  one's  intuitive 
judgment:  "Does  the  measure  seem  valid?" 
Although  it  coincides  with  commonsense  no- 
tions of  validity,  mezisurement  experts  take  a 
dim  view  of  face  validation;  they  consider  it 
as  untrustworthy  compared  with  empirically 
based  strategies.  "Obviously"  valid  mezisures 
often  fail  to  stand  up  under  empirical  testing 
(Selltiz  et  al.  1963,  p.  151;  Cronbach  1971,  p. 
453),  and  outcome  measures  have  rarely  been 
subjected  to  empirical  validation. 

Yet  what  coiild  be  wrong  with  the  observa- 
tions of  death  or  even  disease  or  disability? 
Surely,  they  objectively  measure  what  they 
purport  to,  and  their  "empirical  validity"  could 
be  shown  if  one  made  the  effort.  Perhaps.  But 
one  must  avoid  a common  misunderstanding 
here.  The  validity  of  an  indicator  may  vary 
depending  on  which  concept  it  seeks  to  meas- 
ure. Even  when  death  or  disease  measure 
health  status  validly,  they  may  meeisure 
quality  of  care  much  less  well.  For  example, 
the  outcomes  of  care  depend  in  part  upon  pa- 
tient compliance,  and  research  shows  that 
noncompliance  occurs  in  a substantial  per- 
centage of  czises  (Marston  1970;  Wilson  1973). 
Although  many  have  recognized  that  end  re- 
sults have  determinants  in  addition  to  quality 
of  care,  they  have  not  recognized  that  the 
variance  associated  with  these  other  factors  is 
systematic  measurement  error,  a type  of  in- 
validity. Thus,  the  validity  of  measures  of 
quality  of  care  baised  on  health  status  or  pa- 
tient satisfaction  is  less  than  entirely  obvious, 
since  they  reflect  extraneous  factors  to  some 
extent. 

The  most  forceful  arguments  for  outcome 
assessments  of  quality  have  come  from  econ- 
omists who  are  t5rpically  concerned  with 
evaluating  the  performance  of  industrial  or- 
ganizations that  theoretically  should  have  a 
high  degree  of  control  over  the  quality  of  their 
output.  But  one  cannot  uncritically  generalize 
these  eirguments  to  measuring  hospital  per- 
formeince,  since  hospitals  have  much  less 


control  over  outcomes. 

If  outcome  measures  have  less  than  perfect 
validity,  exactly  how  strong  is  the  likely  con- 
nection between  quality  of  care  and  outcome 
measiires?  As  quality  of  care  is  a hypothetical 
construct,  there  is  no  way  to  determine  the 
answer  directly.  But  there  are  reasons  to  be- 
lieve that  nonquality  determinants  of  out- 
comes could  be  substantial  (see  McAuliffe 
1978a  for  numerous  examples  of  questionable 
outcome  measures),  and  so  the  validity  of 
outcome  measures  cannot  be  taken  for 
granted.  Contrary  to  current  practice,  out- 
come measures  must  be  empirically  validated 
just  as  process  measures  must,  for  outcome 
meeisures  of  quality  are  not  obviously  valid. 

Because  experts  in  quality-of-care  assess- 
ment have  almost  always  taken  "validation"  to 
mean  "correlation  with  outcomes,"  many 
readers  may  still  have  difficulty  understanding 
how  outcome  measures  of  quality  of  care 
could  be  invalid,  or  how  outcome  measures 
might  be  validated  empirically.  This  difficulty 
is  just  one  of  the  many  recisons  for  believing 
that  the  definition  "correlation  with  out- 
comes" is  too  narrow  for  validation  of  meas- 
ures of  quality,  be  they  structural,  process,  or 
outcome  (see  McAuliffe  19786  for  a detailed 
discussion).  There  is  a broader  theory  of 
measurement  validity,  developed  primarily  by 
psychologists,  which  offers  an  analytical 
framework  appropriate  for  assessing  the  va- 
lidity of  outcome  measures. 

According  to  psychometric  measurement 
theory,  validity  is  defined  as  the  amount  of 
correspondence  between  a concept  (such  as 
quality  of  care)  and  a measure  (such  as  an 
outcome  index).  The  measure  is  valid  insofar 
as  it  is  pure  (excludes  extraneous  factors), 
complete  (covers  all  relevant  aspects),  axid 
representative  (has  the  proper  balance  or  mix 
of  relevant  zispects).  Validity  is  expressed 
quantitatively  as  the  proportion  of  a measure's 
variance  that  is  zissociated  with  the  concept  of 
interest  (Kerlinger  1965;  Cronbach  1971;  Nun- 
nally  1978).  The  remaining  variance  represents 
either  s5rstematic  or  random  mezusurement 
error. 

Extraneous  Outcome  Variance.— Although 
few  investigators  have  explicitly  evaluated  the 
validity  of  outcome  measures  of  quality  of 
care,  there  have  been  some  important  excep- 
tions. Roemer  et  al.  (1968)  noted  that  crude 
hospital  mortality  rates  have  been  viewed  with 
much  skepticism  as  measures  of  quality  be- 
cause patient  characteristics — their  diagnosis, 
severity  of  illness,  and  general  health  sta- 


27 


tus— may  vary  greatly  from  one  hospital  to  the 
next,  and  patient-mix  differences  may  be 
more  important  than  quality-of-care  dif- 
ferences in  determining  mortality  rates.  The 
skepticism  appeared  well  founded,  since 
Roemer  et  al.  found  that  crude  death  rates 
were  higher  in  teaching  hospitals  than  in  non- 
teaching hospitals,  higher  in  accredited  hos- 
pitals than  in  nonaccredited  hospitals,  and 
higher  in  more  technologically  sophisticated 
hospitals  tham  in  less  technologically  sophis- 
ticated hospitals.  Goss  and  Reed  (1974)  have 
reported  similar  results.  Roemer  et  al.  as- 
serted that  hardly  anyone  would  suggest  that 
these  results  mean  that  the  quality  of  care 
was  superior  in  the  nonteaching,  nonac- 
credited, or  less  technologically  sophisticated 
hospitals.  A more  probable  explanation  is  that 
the  crude  death  rate  had  low  validity  as  an 
indicator  of  hospital  quality. 

Other  studies  have  verified  that  substantial 
proportions  of  the  varieince  in  mortality  are 
associated  with  factors  other  than  hospital 
quality.  In  a study  of  surgical  mortality  rates 
2is  an  outgrowth  of  the  National  Halothane 
Study  (Bunker  et  al.  1969),  Moses  and  Mos- 
teller  (1968)  showed  that  much  of  the  variance 
in  rates  was  associated  with  nonquality  fac- 
tors. Standardization  for  patient  differences, 
type  of  operation,  zmd  patient  physical  status 
explained  24.3  percent,  68.6  percent,  and  40.8 
percent,  respectively,  of  the  variance  in  mor- 
tality rates  among  the  study’s  34  hospitals 
(calculated  from  table  5,  Bunker  et  al.  1969,  p. 
196).  A composite  of  type  of  operation  and 
physical  status  explained  74.4  percent  of  the 
variance.  Moreover,  the  authors  could  not  de- 
termine the  precise  proportion  of  variance 
attributable  to  quality  (that  is,  they  could  not 
show  that  any  of  the  outcome  variance  was 
valid)  because  they  had  no  independent  meas- 
ures of  quality  of  Ceire  (e.g,,  process  mezis- 
ures).  So,  although  these  findings  do  not  prove 
conclusively  that  uncontrolled  death  rates  are 
largely  invalid  as  meetsures  of  quality  of  care, 
they  are  consistent  with  doubts  raised  about 
crude  mortality  rates  as  such  mezisures. 

Following  up  the  Moses-Mosteller  inquiry, 
the  Stanford  Center  for  Health  Care  Research 
(Scott  et  al.  1976;  Flood  et  al.  1977)  imdertook 
another  study  of  hospital  differences  in  post- 
operative patient  mortality  and  morbidity. 
Recognizing  the  need  to  remove  the  effects  of 
extraneous  variables,  the  researchers  statis- 
tically adjusted  the  outcomes  for  differences 
due  to  stage  of  disease  (severity),  patient's 
age,  sex,  physical  status,  cardiovascular  sta- 
tus, and  whether  the  surgery  was  elective  or 


emergency.  The  percentage  of  variance  ac- 
counted for  by  those  controls  vauied  by  diag- 
nosis from  a low  of  2 percent  to  a high  of  44 
percent.  Flood  et  al.  (1977)  then  introduced 
measures  of  quality  inputs,  including  hospital 
characteristics  (size,  teaching  status,  and  ex- 
penditures) and  surgeon  characteristics 
(specialization,  certification,  number  of  res- 
idencies, etc.).  At  best,  all  surgeon  and  hos- 
pital characteristics  combined  accounted  for 
no  more  than  a total  of  1 percent  of  the  out- 
come variance,  even  thou^  the  variance  due 
to  patient  characteristics  had  already  been 
removed.  These  results,  should  they  be  con- 
firmed by  subsequent  research,  raise  serious 
questions  concerning  the  validity  of  existing 
outcome  measures  of  quality  of  care  and  the 
current  viability  of  outcome  approaches  to 
quality  assessment. 

Finally,  Martini  et  al.  (1977)  have  analyzed 
the  percentages  of  British  regional  variations 
in  rates  of  mortality,  complications,  and  mor- 
bidity that  are  explained  by  sociodemographic 
factors  (e.g.,  age,  socioeconomic  status)  and 
the  structure  of  regional  medical  care  ssrstems 
(e.g.,  expenditures,  percentage  of  care  oc- 
curring in  teaching  hospitals).  The  authors 
concluded  that  "indexes  constructed  from  the 
traditional  outcome  mezisvires  are  more  sen- 
sitive to  sociodemographic  circumstances  . . . 
than  to  the  amount  of  medical  care  provided 
and/or  available"  (Martini  et  al.  1977,  p.  306). 
Although  the  study's  focus  was  not  quality  of 
care  in  hospitals,  its  sample  was  small  (15), 
and  the  quality  of  inputs  and  process  was 
mezisured  only  crudely,  the  consistency  of  its 
results  with  those  of  the  other  studies  already 
reviewed  nevertheless  helps  build  a case 
against  the  uncritical  acceptance  of  outcome 
measures  of  quality  of  care. 

Data  QuoZify.-— Outcome  measures  can  also 
be  impure,  and  therefore  invalid,  as  a result  of 
random  rather  than  sjrstematic  meaisurement 
error.  In  psychometric  theory,  random  meas- 
urement error  is  defined  as  unreliability, 
which  in  turn  sets  a ceiling  for  validity  (Nun- 
nally  1978);  to  the  extent  that  a measure  is 
unreliable,  it  is  invalid. 

Although  outcome  assessments  based  on 
mortality  are  frequently  preferred  on  the 
grounds  that  they  are  more  objective  (objec- 
tivity is  one  component  of  reliability),  the 
other  jispects  of  health  status  (e.g.,  symptoms, 
functional  level)  are  less  objective.  Brook 
(1973)  found  that  physicians  often  disagreed 
when  judging  patients'  outcomes  from  fol- 
lowup interview  data.  The  average  correlation 


28 


among  the  10  judges  was  0.61  (Brook  1973,  p. 
38,  author’s  calculation). 

Random  meeisurement  errors  can  contam- 
inate outcome  measurements  in  other  ways  as 
well.  Patients'  physical  condition  or  subjective 
reports  of  symptoms  may  fluctuate  from  one 
day  to  the  next,  pathology  laboratory  reports 
may  be  in  error  (Donabedian  1969,  pp.  28-29), 
physiological  measures,  such  as  urine  cultures 
or  blood  pressure  readings,  are  sometimes  in 
error  (Labarthe  et  al.  1973;  Maskell  and  Pead 
1976),  outcome  data  in  medical  records  may 
be  incomplete  (Fessel  and  Van  Brunt  1972, 
table  3),  and  errors  may  be  made  in  the  proc- 
ess of  abstracting  data  from  the  charts. 

Linn  et  al.  (1974)  correlated  zissessments  of 
outcome  (13  categories  of  "impairment")  based 
on  record  review  with  comparable  assessments 
made  by  the  patients’  attending  physicians  at 
discharge.  The  13  correlations  ranged  from 
0.19  to  0.66,  with  a median  of  0.46.  The 
disagreements  in  zissessment  were  not  due 
entirely  to  poor  recordkeeping,  however,  since 
the  attending  physicians’  zissessments  were 
shown  to  be  somewhat  unreliable,  and  the 
medical-record-bzised  assessments  predicted 
death  at  followup  slightly  more  accurately. 

Incomplete  Outcome  Measures.— Another 
potential  source  of  invalidity  in  outcome 
measures  is  the  incompleteness  of  assessments 
based  on  only  some  of  the  relevant  effects  of 
medical  care.  This  incompleteness  is  in  part 
the  methodological  upshot  of  the  "practical" 
obstacles  to  gathering  data  on  long-term  and 
other  difficult-to-observe  effects  of  care. 
Since  much  medical  care  is  directed  toward 
outcomes  occurring  after  discharge,  large 
components  of  quality  could  remain  unzissessed 
if  one  were  to  rely  solely  on  "intermediate 
outcomes"  from  inpatient  medical  records. 

Outcome  measures  also  are  often  not  sen- 
sitive to  many  diagnostic  aspects  of  care 
(McAuliffe  19786).  Since  most  medical  audit 
studies  sample  czises  by  diagnosis,  they  often 
exclude  from  consideration  patients  incor- 
rectly diagnosed  as  a result  of  inadequate 
process  (see  Greenfield  et  al.  1977  for  a study 
of  such  a sample).  If  such  patients  are  dis- 
charged and  have  a poor  outcome  as  a result 
of  not  receiving  needed  care,  their  czises 
ezisily  could  be  overlooked  because  the  pa- 
tients ended  up  in  different  hospitals.  Fur- 
thermore, many  diagnostic  procedures  are 
designed  for  detecting  special  (but  often  rare) 
management  problems  (such  as  allergic  drug 
reactions).  Failure  to  perform  these  essential 
procedures  for  all  patients  will  affect  only  the 


outcomes  of  patients  having  the  problem. 
Often  there  will  be  no  such  patients  in  samples 
as  small  as  the  usual  50  cases  examined  in 
medical  care  evaluation  studies,  and  if  so,  an 
outcome  zissessment  would  fail  to  reflect  im- 
portant diagnostic  inadequacies  in  care  (see 
McAuliffe  19786  for  examples).  In  general,  if  a 
medical  process  includes  medically  warranted 
procedures  whose  effects  are  unrepresented  in 
the  study’s  outcome  data,  the  outcome  data 
are  incomplete  and  therefore  somewhat  in- 
valid zis  a measure  of  quality  of  care. 

Proponents  of  outcome  meeisurement  might 
nevertheless  counter  that  outcome  meeisures 
still  are  more  complete  than  process  or 
structural  measures,  because  most  relevant 
components  of  care-including  unrecorded 
zispects  of  surgical  or  medical  care,  as  well  as 
the  performance  of  other  segments  of  the 
medical  care  system- -affect  patients’  out- 
comes. But  the  concept  "outcome"  is  itself  a 
broad  and  complex  construct,  and  if  few  ex- 
tant outcome  mezisures  cover  its  domain  sat- 
isfactorily, then  outcome  mezisures  may  not  be 
more  complete  than  process  measures.  For 
example,  death  rates  often  may  not  detect 
differences  in  care  as  higher  levels  of  per- 
formance and  skill  are  achieved,  or  where  a 
diseeise  is  rarely  lifethreatening.  Conse- 
quently, data  on  mortality  should  be  supple- 
mented by  data  on  morbidity,  functional  sta- 
tus, subjective  distress,  and  so  on,  if  the  out- 
come assessment  is  to  approach  completeness. 

How  differences  in  completeness  affect 
outcome  mezisures  is  illustrated  by  the  Stan- 
ford study  described  earlier  (Scott  et  al.  1976), 
which  employed  five  meeisures  of  outcome 
reflecting  different  combinations  of  data  on 
mortality,  severe  morbidity,  moderate  mor- 
bidity, 2ind  postoperative  complications. 
Measure  1 (death)  and  Measure  5 (death  or 
incomplete  return  to  function)  showed  no 
significant  differences  between  hospitals, 
whereas  the  other  three  outcome  measures 
(death  or  severe  morbidity;  death  or  moderate 
or  severe  morbidity;  death  or  monitors  or 
catheters)  resulted  in  significant  interhospital 
differences.  Thus,  the  conclusions  one  draws 
regarding  quality  of  care  would  depend  upon 
the  outcome  measure  one  chose  to  examine. 
Three  other  quality-of-care  studies  (Brook 
1973;  Romm  et  al.  1976;  Shortell  et  al.  1976) 
also  found  low  correlations  among  alternative 
outcome  meeisures. 

The  results  of  these  four  studies  thus  furnish 
additional  weight  against  the  uncritical  ac- 
ceptance of  the  validity  of  outcome  measures 
of  quality  of  care.  It  is  likely  that  up  to  now 


29 


rese2U‘chers  have  employed  obviously  incom- 
plete outcome  measures  on  the  assumption 
that  the  different  dimensions  of  outcome 
(death,  disease,  disability,  etc.)  were  highly 
correlated,  and  therefore  the  omitted  data 
would  be  mostly  redundant.  Such  an  assump- 
tion is  probably  unwarranted  in  many  cases. 


Techniques  for  Increasing  the  Validity 
of  Outcome  Measures 

Since  most  of  the  problems  with  outcome 
meeisures  have  long  been  recognized  even  if 
not  labeled  as  invalidity,  over  the  yeau^  many 
techniques  have  been  proposed  for  improving 
outcome  measures.  The  techniques  include 
statistical  adjustments  (e.g.,  age-zidjusted 
mortality  rates,  multiple  regression),  examin- 
ing patterns  of  care  because  they  are  more 
reliable  than  individual  cases  (Jacobs  and 
Jacobs  1974,  p.  46),  judgmentally  discounting 
unpreventable  poor  outcomes  (Jacobs  and 
Jacobs  1974,  p.  40),  using  statistically  derived 
standards  (cutoffs)  for  acceptable  outcome 
rates  (McAuliffe  1978a),  and  focusing  on 
"tracers"  (Kessner  et  al.  1973)  or  "sentinel" 
outcomes  (Rutstein  et  al.  1976)  that  are  known 
to  be  relatively  "pure"  measures  of  quality. 
Each  of  these  techniques  seeks  in  its  own  way 
to  maximize  the  valid  proportion  of  the  out- 
come variance. 

However  promising  the  techniques  may  be, 
none  has  yet  been  shown  to  be  both  practical 
and  effective.  For  example,  application  of 
eidvanced  methods  of  statistical  eidjustments 
such  as  those  employed  in  the  Stanford  study 
(Scott  et  al.  1976)  requires  considerable  ex- 
pertise that  is  not  widely  available,  may  de- 
mand elaborate  data  collection  efforts,  and 
has  not  yet  been  proven  to  be  effective.  Use 
of  these  techniques  does  not  guarantee  that 
the  resulting  mezisure  will  possess  acceptable 
validity;  the  final  outcome  assessments  still 
must  be  validated.  Because  this  point  is  so 
important,  but  routinely  missed,  a specific 
instance  is  described. 

In  the  study  of  hospital  death  rates  men- 
tioned above,  Roemer  et  al.  (1968)  hypothe- 
sized that  validity  might  be  increased  if  the 
rates  were  adjusted  for  case  severity.  Because 
ideal  zidjustments  would  require  collecting 
extensive  data  on  diagnosis  and  disease  se- 
verity, Roemer  et  al.  chose  instead  to  adjust 
the  death  rates  for  occupancy-corrected 
length  of  stay,  which  the  authors  considered  a 
practical  "approximate  mezisure"  of  case  se- 
verity. However,  make-do  or  proxy  mejisures 


usually  reduce  the  effectiveness  of  statistical 
controls,  amd  therefore  the  index's  validity 
was  still  in  doubt.  Roemer  et  al.  compared 
their  index  to  meaisures  of  hospital  techno- 
logical aidequacy.  Joint  Commission  on  Ac- 
creditation of  Hospitals  (JCAH)  accreditation, 
and  voluntary  versus  proprietary  status.  For 
most  comparisons,  but  not  all,  the  statistical 
auijustment  successfully  reversed  the  previous 
relationships  between  these  structural  meais- 
ures  and  the  uncorrected  death  rate,  and  thus 
appeared  to  increase  validity.  But  Goss  and 
Reed  (1974)  were  unable  to  replicate  Roemer 
et  al.'s  findings  on  a sample  of  97  hospitals. 
Goss  and  Reed  2u-gued  that  length-of-stay  aA- 
justed  death  rates,  like  crude  death  rates, 
have  doubtful  validity  and  need  further 
refinement. 


Reciprocal  Validation  of  Structure, 

Process,  and  Outcomes 

How  could  outcomes  be  such  poor  measures 
of  quality  if  they  are  the  "ultimate  validators" 
of  process  and  structural  criteria?  First  of  all, 
as  explained  earlier,  one  cannot  assume  that 
an  outcome  mezisure  would  necessarily  be  as 
valid  in  an  uncontrolled  audit  study  as  it  would 
be  in  a randomized,  controlled  experiment 
designed  to  assess  the  efficacy  of  structure  or 
process. 

It  is  also  incorrect  to  eissume  that  a measure 
employed  to  validate  (in  the  measurement 
sense)  another  indicator  is  necessarily  superior 
in  validity.  In  fact,  whenever  a new,  more  re- 
fined mezisure  is  developed,  its  initial  vali- 
dation Tosually  includes  comparison  with 
existing  and  accepted,  but  ultimately  less 
valid,  measures  of  the  concept. 

Finally,  although  outcome  measures  are 
used  to  validate  structural  and  process  cri- 
teria, the  reverse  is  also  true,  as  wzis  shown  in 
studies  by  Roemer  et  al.  (1968)  and  by  Kisch 
2ind  Reeder  (1969).  Using  a measure  as  a val- 
idator assumes  validity  but  does  not  convey 
it— structure,  process,  zind  outcomes  can 
validate  each  other  only  because  theoretically 
each  can  be  assumed  to  possess  some  validity. 
If,  let  us  say,  process  and  outcome  measures 
do  agree  in  a properly  designed  study,  then  our 
faith  in  the  validity  of  both  is  strengthened. 
Should  they  fail  to  agree,  other  information 
(e.g.,  their  respective  correlations  with 
structural  indices)  is  needed  to  interpret  the 
failure.  Again,  which  measure  was  formally 
designated  as  the  "validator"  means  nothing  by 
itself. 


30 


It  should  now  be  clear  that  claims  of  va- 
lidity for  outcome  measurement  on  the 
grounds  of  greater  objectivity  and  com- 
pleteness were  not  solidly  beised.  Whatever 
"obvious"  validity  outcome  measures  seem  to 
possess  fades  when  examined  closely,  for  there 
are  many  ways  outcome  measures  could  have 
low  validity. 

Are  Outcome  Measures  Clearly  Superior 
After  AU? 

The  guivocacy  of  outcome  measures  was 
based  almost  entirely  on  a theoretical,  rather 
than  empirical,  analysis  of  the  measures,  and 
the  main  propositions  in  the  argument  have 
now  been  examined  in  detail.  In  response  to 
the  contention  that  outcomes  are  the  proper 
object  for  quality  gissessment  because  the  goal 
of  medical  care  is  health,  it  can  be  argued 
that  the  quality  of  care  is  more  directly 
gauged  by  focusing  on  medical  care,  the 
process;  outcomes  gire  less  direct  manifesta- 
tions of  quality.  Outcome  mezisures  do  not 
possess  the  face  validity  claimed  for  them, 
because  it  is  apparent  that  outcomes  usually 
reflect  more  than  just  the  effects  of  care, 
often  do  not  include  many  relevant  effects  of 
care,  and  are  based  on  (^ta  that  are  poor  in 
quality.  Moreover,  validity  cannot  be  gissumed 
for  outcome  megisures  just  because  they  serve 
to  validate  structural  and  process  criteria, 
since  the  reverse  is  also  true. 

At  present,  there  is  also  little  empirical 
evidence  that  demonstrates  the  high  validity 
of  outcome  meeisures  that  proponents  have 
smsumed.  Because  the  validity  of  outcome 
measures  held  tj^ically  been  taken  for  granted, 
few  studies  sought  to  provide  the  necessary 
empirical  confirmation.  Examination  of  lim- 
ited, existing  data  has  shown  that  doubts  about 
the  validity  of  outcome  measures  may  be  well 
founded.  Factors  unrelated  to  the  medical 
care  system  accounted  for  substantial  pro- 
portions of  outcome  variance,  far  more,  in 
fact,  than  did  meeisures  of  medical  inputs.  In 
addition,  alternate  measures  of  outcome  often 
failed  to  correlate  with  each  other.  Various 
statistical  techniques,  such  gis  multiple  re- 
gression, have  been  proposed  as  possible  so- 
lutions to  these  problems,  but  none  has  yet 
been  convincingly  demonstrated  to  be  ef- 
fective or  practical.  Cleeirly,  further  research 
is  needed  before  the  validity  of  operational 
outcome  measures  can  be  decided.  Thus,  out- 
come measures  are  not  demonstrably  superior 
to  other  types  of  measures. 


Process  Measurement 

If  outcomes  are  not  clearly  best,  how  do 
they  compare  with  process  measures,  which 
have  long  been  under  attack?  In  this  section, 
the  relevant  arguments  and  evidence  are  ex- 
amined, and  finding  previous  interpretations 
either  incorrect  or  overstated,  it  is  concluded 
that  process  mezisures  are  at  legist  gis  promis- 
ing gis  outcome  megisures. 

Criticisms  of  Process  Megisurement 

After  decades  of  using  structural  criteria 
and  a brief  period  of  flirting  with  the  idea  of  a 
process-oriented  audit,  the  JCAH  recently 
adopted  an  outcome-oriented  system  for 
measuring  quality  of  care.  According  to 
Jacobs  and  Jacobs  (1974,  p.  32),  who  designed 
JCAH's  outcome  audit,  they  passed  over 
process-auditing  because  of  the  following 
regisons: 

. . . [It]  is  cumbersome;  a list  of  the  proc- 
esses of  care  for  all  but  the  simplest 
diagnoses  can  include  many  dozens  of 
items,  and  each  of  these  must  be  checked 
off  for  each  chart  reviewed.  . . . The  re- 
lationships of  many  health  care  processes 
to  desired  health  care  results  is 
questionable  or,  at  best,  unverified  by 
empirical  evidence.  . . . The  uncritical  use 
of  process  megisures  runs  the  danger  of 
penalizing  practitioners  who  obtain 
satisfactory  patient  outcomes  by  routes 
other  than  those  prescribed  by  process 
criteria,  thus  stifling  innovations  in 
treatment.  In  response,  practitioners  may 
order  tests  and  procedures  to  satisfy 
criteria  lists,  rather  than  on  the  bgisis  of 
their  best  clinical  judgement,  thereby 
increasing  the  use  of  ancillary  services. 
(author's  italics) 

Process-auditing  from  medicgil  records  or 
abstracts  also  hgis  been  criticized  because  the 
data  often  are  incorrect  or  incomplete 
(Zuckerman  et  al.  1966),  and  therefore  the 
data  fail  to  reflect  what  actually  happened  to 
the  patient.  Kelman  (1976)  also  questioned 
whether  recording  "done/not  done"  for  various 
procedures  does  not  overlook  the  intangible 
"true  qualit/'  or  skill  facets  of  medical  care. 
Just  because  a procedure  wgis  performed  does 
not  mean  it  was  done  well.  Brook  et  al.  (1976, 
p.  17)  pointed  out  that  process-auditing  based 
on  medical  records  wovdd  also  routinely  miss 
psychosocial  gispects  of  care  (patient  satis- 


31 


faction).  Finally,  current  audits  focus  on  phy- 
sician or  nurse  performance  only  and  ignore 
the  various  other  aspects  of  patient  care. 

To  summauize,  process  measurement  has 
been  criticized  as  (1)  impractical  because 
criteria  are  difficult  to  develop  and  cumber- 
some to  apply;  (2)  undesirable  in  its  impact  on 
innovations  aind  medical  costs;  aind  (3)  invalid 
since  it  covers  limited  aspects  of  care,  many 
processes  have  not  been  proven  effective,  and 
data  sources  contain  errors. 

Analysis  of  the  Criticisms 

Practical  Problems  in  Process-Auditing.— 
Although  applying  mainy  process  criteria  may 
seem  to  be  more  trouble  than  applying  a few 
outcome  criteria,  the  advantage  to  outcome 
meaisures  would  hold  only  as  long  as  essential 
outcome  data  were  reauiily  obtainable  from 
medical  records.  But  outcome  data  in  medical 
records  often  are  quite  limited,  and  if  fol- 
lowup surveys  are  needed  to  fill  the  gap,  then 
an  outcome  approach  could  be  eis  much  trouble 
as  process-auditing.  In  fact,  even  proponents 
of  outcome  measurement  (Starfield  1974) 
recommend  process  assessment  as  more  con- 
venient when  outcomes  are  long-term  (e.g., 
immunizations;  for  other  examples,  see  Ro- 
senberg 1977,  p.  1936).  Added  to  the  cost  of 
collecting  outcome  data  are  the  difficulties  of 
performing  the  sophisticated  statistical  anal- 
yses outcome  measures  seem  to  require. 

Outcome  s5rstems  typically  escape  the  bur- 
dens of  collecting  followup  survey  data  by 
sacrificing  the  completeness  of  their  assess- 
ments, and  process  aissessments  permit  similar 
tradeoffs.  The  cumbersome  aspects  cem  be 
reduced  by  focusing  on  only  key  process  cri- 
teria, ideally  the  criteria  that  most  cleeirly 
differentiate  between  adequate  and  inade- 
quate czire  (see  Richardson  1972  for  such  an 
example). 

Developing  process  criteria  may  currently 
consume  Icirge  amounts  of  audit  committees' 
energies,  but  it  is  likely  that  the  committees 
concentrate  on  process  criteria  more  because 
of  the  abilities  and  interests  of  their  members 
than  because  of  inherent  differences  in  the 
types  of  measure.  Audit  committee  members 
are  medical  personnel  who  are  more  interested 
in  and  better  trained  for  evaluating  the  pre- 
dominantly medical  issues  raised  when  devel- 
oping process  criteria  than  for  evaluating  the 
statistical  issues  more  commonly  raised  by  the 
selection  of  outcome  criteria.  If  the  commit- 
tees included  more  statistically  oriented 
meeisurement  experts,  and  if  the  validity  of 


outcome  measures  received  the  amount  of 
attention  it  deserved,  then  developing  out- 
come measures  could  easily  require  as  much 
time  and  effort  as  is  currently  spent  on  proc- 
ess criteria. 

It  is  also  noteworthy  that  outcome-oriented 
methods  such  as  those  of  the  JCAH  require 
process  aissessment  to  determine  why  a pa- 
tient's outcome  was  unsatisfactory  and  how 
care  should  be  improved,  and  so  these  methods 
require  developing  and  applying  both  process 
and  outcome  criteria.  In  principle,  at  least,  if 
the  outcome  criteria  are  complete,  developing 
process  criteria  for  verifying  unfavorable 
outcomes  should  be  just  as  difficult  as  it  would 
be  for  a normal  process  audit. 

Impacts  an  Innovation  and  Medical  Effi- 
ciency.—Ptocess-auditing  might  hinder  true 
innovation  and  lead  to  "defensive  medicine" 
(e.g.,  ordering  unnecessary  laboratory  tests), 
but  the  extent  would  hinge  on  how  rigidly 
audit  committees  adhere  to  prescribed  criteria 
and  stiindards,  eind  apply  sanctions.  Current 
trends  zire  clearly  toward  flexible  criteria, 
niamerous  reviews  by  peers  before  finding 
fault,  and  many  opportunities  for  appeal.  Yet 
some  infringements  on  the  freedom  of  clini- 
cians is  inevitable  in  any  system  of  regulating 
care,  regardless  of  the  method  of  aissessment. 
Even  in  the  JCAH's  outcome  system,  phjrsi- 
cians  must  be  prepared  to  justify  decisions 
that  deviate  from  standard  practice  whenever 
outcomes  are  poor.  It  is  therefore  hard  to  see 
why  a system  based  on  process  assessment 
would  be  more  stifling  than  one  that  combined 
outcome  and  process. 

Validity:  Data  Qualify.- -Ultimately,  prob- 
lems of  data  quality  may  profoundly  affect 
how  quality-of-care  assessments  are  con- 
ducted. Researchers  performing  retrospective 
process  audits  have  found  that  both  process 
and  outcome  data  are  incompletely  recorded 
(e.g.,  Zuckermzin  et  al.  1966;  Fessel  and  Van 
Brunt  1972;  Lindsay  et  al.  1976),  aind  the 
missing  data  no  doubt  reduce  the  validity  of 
the  measures,  especially  if  missing  data 
sometimes  reflect  negative  findings  and  other 
times  reflect  noncomplizince  with  criteria. 
However,  Roos  et  al.  (1977,  p.  3)  argue  that 
validation  st\idies  have  demonstrated  that 
medical  records  are  more  accurate  than  re- 
sponses to  questionnaire  surveys  (one  impor- 
tant source  of  outcome  data),  and  since  phy- 
sicians now  know  that  record  are  subject  to 
review,  their  records  should  become  more 
complete.  Also,  medical  records  could  be  im- 


32 


proved  by  standardizing  recording  formats. 
There  is  a danger,  however,  that  a heightened 
awareness  of  the  role  of  medical  records  in 
regulation  could  result  in  instances  of  falsi- 
fication; different  data  sources  for  both  proc- 
ess and  outcome  may  eventually  be  needed  in 
special  cases. 

At  present,  it  is  difficult  to  say  precisely  to 
what  extent  low  data  quality  affects  the  va- 
lidity of  process-auditing  because  there  are  no 
completely  adequate  studies.  For  example, 
Zuckerman  et  al.  (1966)  have  documented  the 
incompleteness  of  medical  record  data  by 
comparing  the  content  of  medical  record  data 
with  audio  tape  recordings  of  the  same  pa- 
tient-physician encounter.  But  to  estimate  the 
effect  of  the  missing  information  on  the  va- 
lidity of  process  zissessments,  one  would  have 
to  go  one  step  further  than  Zuckerman  et  al. 
by  process-auditing  the  medical  records  and 
the  tapes  separately,  and  then  correlating  the 
two  independent  zissessments. 

Validity:  The  Limits  of  Medical  Record 
Data. — The  validity  of  process  assessment  has 
been  questioned  because  it  concentrates  on 
physicians'  technical  performance,  which  is 
just  one  component  of  care,  and  mezisures 
even  that  component  crudely,  since  it  counts 
merely  what  is  done  (e.g.,  a surgical  proce- 
dure) but  not  how  well. 

These  appear  to  be  rather  serious  short- 
comings, but  without  further  study  it  is  dif- 
ficult to  estimate  the  comparative  disadvan- 
tage or  its  importance  to  regulatois.  It  must 
be  remembered  that  outcomes  are  also  far 
from  perfect  as  meaisures  of  high-level  medi- 
cal sldll.  Furthermore,  Federal  quality  regu- 
lation tends  to  be  concerned  with  determining 
whether  the  minimum  rather  than  highest 
standards  of  care  have  been  met,  and  there- 
fore the  upper  ranges  of  medical  skill  are 
probably  beyond  regulators'  range  of  interest. 
And  so,  performance  or  nonperformance  of 
essential  procedures  may  represent  the  lion's 
shztre  of  what  regulators  want  to  know.  In  any 
czise,  the  "intangibles  of  care"  actually  can  be 
measured  indirectly  by  process  assessment, 
just  as  they  are  by  outcome  measurements. 

Indirect  measurement  is  achieved  as  long  as 
the  characteristics  being  measured  correlate 
or  overlap  with  those  vinmeasured,  and  there  is 
evidence  that  the  elements  of  good  care  (in- 
cluding taking  adequate  histories,  ordering 
necessary  tests,  as  well  as  the  "intangibles") 
do  correlate  with  one  another  (Peterson  1956, 
p.  19;  Rosenfeld  1957,  p.  862;  Lyons  and  Payne 
1974).  So,  if  a physician  orders  the  correct 


diagnostic  tests,  and  if  ordering  correlates 
with  skillfulness  and  thoroughness  in  taking  a 
history,  then  the  odds  are  that  he  or  she  will 
have  taken  a good  history;  failing  to  observe 
the  history-taking  session  may  therefore  cause 
little  harm.  Usually,  pair-wise  correlations 
between  process  criteria  are  modest,  but  the 
multiple  correlation  between  a single  criterion 
and  the  usual  large  number  of  other  criteria  in 
a process  composite  may  be  quite  high— so 
hi^  that  including  that  one  criterion  (such  as 
a meaisure  of  skill  in  history-taking)  in  the 
composite  might  add  virtually  no  new  infor- 
mation (see  Richardson  1972  for  a demonstra- 
tion). Thus,  process  measures  are  theoretically 
just  as  capable  of  measuring  the  unmeasurable 
as  are  outcome  measures.  Whether  in  practice 
either  tjq)e  of  measure  validly  reflects  these 
aspects  of  care  is  unknown. 

If  current  process  data  prove  to  be  too 
skimpy  for  some  aspects  of  care,  modifica- 
tions in  assessment  methods  may  be  necessary. 
For  example,  process  data  in  the  medical 
record  mi^t  be  too  insensitive  to  the  aspects 
of  care  that  produce  high  rates  of  postopera- 
tive infection.  In  those  instances,  either  the 
process  data  source  could  be  improved  (by 
direct  observation,  improved  recording  of 
relevant  data,  for  example)  or  outcome  meas- 
ures added.  Cost  and  relative  validity  would 
dictate  the  choice. 

Validity:  Correlations  With  Outcomes.- -The 
chief  chairge  against  process  zissessment  is 
that  it  lacks  validity  because  many  procedures 
are  ineffective,  as  shown  by  studies  that  have 
failed  to  find  strong  correlations  between 
process  and  outcomes.  In  a recent  review  of 
nine  published  studies  of  process-outcome 
correlations  little  was  found  to  support  the 
claim  that  process-auditing  is  generally 
invalid  (McAuliffe  1978  b).  Three  studies 
reported  nonsignificant  correlations,  two  had 
mixed  results,  and  four  reported  significantly 
positive  correlations.  But  drawing  conclusions 
from  these  results  is  difficult,  because  the 
studies  had  serious  methodological  flaws 
(discussed  in  detail  in  McAuliffe  1978b)  which 
either  made  obtaining  positive  correlations 
difficult  or  otherwise  left  unclear  the  meaning 
of  a nonsignificant  correlation. 

The  most  obvious  shortcoming  of  the  studies 
was  their  outcome  measurement.  Since  a 
correlation  between  two  mezisures  depends  on 
the  strengths  and  weaknesses  of  both,  one 
must  first  rule  out  the  possibility  that  the 
outcome  mezisure  is  invalid  before  one  can 
safely  infer  that  the  process  measure  is  at 


33 


fault  when  a correlation  is  low.  But,  numerous 
apparent  shortcomings  were  found  in  the 
specific  outcome  measures  employed  in  the 
stiidies,  and  therefore  the  nonsignificant  cor- 
relations could  have  been  entirely  attributable 
to  those  weaknesses. 

The  studies  also  were  improperly  designed 
for  the  purpose  of  validation.  To  obtain  a 
correlation,  for  example,  one  must  have  var- 
iation in  both  variables,  but  in  these  audit 
studies  there  was  often  little  or  no  variation  in 
either  process,  or  outcomes,  or  both.  Conse- 
quently, if  the  designs  of  some  of  the  studies 
were  improved,  they  might  have  completely 
different  results,  as  the  analysis  showed  that 
the  best  designed  studies  were  those  reporting 
positive  correlations. 

Evaluating  the  Case  Against 
Process  Assessment 

The  arguments  against  process  measurement 
are  not  persuasive.  Process-auditing  currently 
involves  more  data  elements  than  does  out- 
come assessment,  and  thereby  process  audits 
seem  to  be  more  trouble  than  outcome  audits. 
But  eventually,  equivalent  amounts  of  effort 
may  be  needed  for  outcome  measurement,  if 
followup  surveys  and  sophisticated  statistical 
analysis  are  required  to  bolster  outcome  va- 
lidity. Some  observers  have  charged  that  proc- 
ess-auditing will  stifle  innovation  and  result  in 
defensive  medicine,  but  those  effects  are  not 
unique  to  process-auditing.  The  validity  of 
process- auditing  has  been  challenged  largely 
on  the  bzisis  of  studies  of  process- outcome 
correlations,  but  the  studies'  results  were  not 
universally  negative,  and  the  most  negative 
studies  were  so  poorly  designed  that  drawing 
firm  conclusions  from  them  is  virtually  im- 
possible. The  studies  also  have  led  to  doubts 
regarding  the  cost-effectiveness  of  process 
regulation,  but  reanalysis  of  relevant  data 
revealed  that  the  results  were  somewhat  more 
promising  than  previously  thought.  Final  ap- 
praisal of  the  validity  and  impact  of  process 
regulation  must  await  better  research,  but  the 
best  existing  studies  leave  room  for  optimism. 
In  any  event,  although  many  potential  pitfalls 
of  process-auditing  can  be  identified,  there  is 
little  definitive  evidence  at  this  point  that 
warrants  rejecting  the  process  approach  in 
favor  of  outcome  meeisurement. 

Conclusions 

At  present,  there  is  little  solid  bzisis  for  the 
widesprezid  view  that  outcome  measures  are 


superior  to  process  measures  for  eissessing  the 
quality  of  medical  care.  Although  many  lead- 
ing authorities  have  been  arguing  vigorously 
that  outcome  meeisurement  is  ultimately 
preferable- -and  they  have  apparently  con- 
vinced most  other  observers— the  logic  of  their 
arguments  and  the  supporting  empirical 
evidence  had  heretofore  never  been  examined 
closely. 

Analysis  shows  that  there  are  parallel  sets 
of  problems  encountered  whether  one  mesis- 
ures  quality  by  process  or  by  outcome.  Prac- 
tically speaking,  both  types  of  measures  re- 
quire a beise  of  knowledge  concerning  the 
medical  relevance  of  criteria.  At  present,  we 
have  relatively  little  scientific  evidence  on 
the  efficacy  of  medical  procedures  or  on  the 
relevance  of  outcome  variance  to  the  effects 
of  care.  Outcomes  assessment  is  clearly  not  a 
solution  to  the  problems  created  by  the  lack  of 
evidence  on  efficacy,  nor  does  regulating  by 
outcomes  insure  that  medical  care  will  be- 
come cost-effective. 

Practical  problems  of  data  collection  and 
data  quality  affect  both  process  and  outcome 
measures.  Both  measures  draw  heavily  on 
medical  records  data,  and  therefore  suffer 
similarly  from  the  incompleteness  and  inac- 
curacies in  medical  records.  Also,  medical 
records  abstractors  disagree  when  coding  both 
process  and  outcome.  The  relative  costs  of  the 
two  approaches  cannot  be  weighed  without 
taking  into  account  validity  and  the  efforts 
needed  to  ensure  validity.  Up  to  now,  only 
process  validity  seems  to  have  received  ade- 
quate attention. 

At  present,  it  is  unclear  which  type  of 
mecisure  is  likely  to  be  more  valid.  Although 
the  validity  of  outcome  meaisures  has  rarely 
been  investigated,  quantitative  evidence  from 
a number  of  relevant  studies  tended  to  con- 
firm the  suspicion  that  many  outcome  meas- 
ures may  be  largely  invalid  as  indices  of 
quality. 

In  contrast  to  outcome  measures,  process 
measures  have  been  attacked  repeatedly  on 
the  grounds  of  validity.  However,  the  attacks 
were  bzised  primarily  on  studies  of  process- 
outcome  correlations  that  were  methodolog- 
ically unsound.  Process  indices  should  not  be 
faulted  if  they  do  not  correlate  highly  with 
outcome  measures  that  have  doubtful  validity 
themselves. 

Obviously,  we  currently  do  not  know  enough 
to  make  a clear  choice  between  process  and 
outcome  measures  as  the  best  method  of  jis- 
sessing  quality  of  czire.  Up  to  now,  discussions 
of  the  relative  merits  of  the  measures  have 


34 


been  almost  entirely  lacking  in  empirical 
evidence.  Conceptual  discussions  of  possible 
pitfalls  of  various  mezisures  are  a usef^ul  first 
step,  but  more  refined  jissessments  are  now 
needed,  since  neither  process  nor  outcome  is 
obviously  superior.  Quality  of  care  refers 
conceptually  to  optimal  performance  by  the 
medical  care  system  to  produce  the  best  pos- 
sible outcome  under  the  circumstances.  Be- 
cause it  is  so  difficult  to  determine  in  any 
particulzir  czise  precisely  what  constitutes  op- 
timal performance  or  the  best  possible  out- 
come, quality  of  care  will  be  difficult  to 
mezisure  no  matter  what  approach  or  blend  of 
approaches  is  employed.  If  we  are  to  learn  how 
quality  can  be  measured  most  validly  and 
practically,  further  research  using  improved 
validation  methods  is  essential. 


References 

Brook,  R.H.  Quality  of  Care  Assessment:  A 
Comparison  of  Five  Methods  of  Peer 
Review.  DHEW  Pub.  No.  (HRA)74-3100. 
Wzishington,  D.C.:  Supt.  of  Docs.,  U.S.  Govt. 
Print.  Off.,  1973. 

Brook,  R.H.  A skeptic  looks  at  peer  review. 
Prism  2(10):  29-32,  1974. 

Brook,  R.H.;  Davies-Avery,  A.;  Greenfield,  S.; 
et  al.  Quality  of  Medical  Assessment  Using 
Outcome  Measures:  An  Over-view  of  the 

Method.  Santa  Monica,  Calif.:  Rand  Corpo- 
ration, 1976. 

Brook,  R.H.;  Davies-Avery,  A.;  Greenfield,  S.; 
et  al.  Assessing  the  quality  of  medical  care 
using  outcome  measures:  An  overview  of  the 
method.  Medical  Care  15(9):Supplement, 
1977. 

Bunker,  J.P.;  Forrest,  W.H.;  Mosteller,  F.;  et 
al.  The  National  Halothane  Study.  Washing- 
ton, D.C.:  Supt.  of  Docs.,  U.S.  Govt.  Print. 
Off.,  1969. 

Cronbach,  L.J.  Test  validation.  In:  Thorndike, 
R.L.,  ed.  Educational  Measurement.  2d  ed. 
Washington,  D.C.:  American  Council  on 
Education,  1971. 

DeGeyndt,  W.  Five  approaches  for  assessing 
the  quality  of  care.  Hospital  Administration 
15  (Winter):21-42,  1970. 

Donabedian,  A.  Evaluating  the  quality  of 
medical  care.  Milbank  Memorial  Fund 
Quarterly  44(3):166-206,  1966. 

Donabedian,  A.  A Guide  to  Medical  Care  Ad- 
ministration. Volume  2:  Medical  Care  Ap- 
praised—Quality  and  Utilization.  New  York: 
American  Public  Health  Association,  1969. 

Donabedian,  A.  The  quality  of  medical  C2u*e. 


Science  200(4344):856-864,  1978. 

Fessel,  W.J.,  and  VanBrunt,  E.E.  Assessing 
quality  of  care  from  the  medical  record. 
New  England  Journal  of  Medicine  286(3): 
134-138,  1972. 

Flood,  A.B.;  Scott,  W.R.;  Ewy,  W.;  et  al.  Ef- 
fectiveness in  Professional  Organizations: 
The  Impact  of  Surgeons  and  Surgical  Staff 
Organizations  on  the  Quality  of  Care  in 
Hospitals.  Stanford,  Calif.:  Stanford  Center 
for  Health  Care  Research,  1977. 

Goss,  E.W.,  and  Reed,  J.I.  Evaluating  the 
quality  of  hospital  care  through  severity- 
euijusted  death  rates:  Some  pitfalls.  Medical 
Care  12(3):202-213,  1974. 

Greenfield,  S.;  Nzidler,  M.A.;  Morgan,  M.T.;  et 
al.  The  clinical  investigation  and  manage- 
ment of  chest  pain  in  an  emergency  depart- 
ment: Quality  eissessment  by  criteria  map- 
p^.  Medical  Care  15(ll):898-905,  1977. 

Institute  of  Medicine.  Advancing  the  Quality 
of  Health  Care:  A Policy  Statement  by  a 
Committee  of  the  Institute  of  Medicine. 
Wzishington,  D.C.:  National  Academy  of 
Sciences,  1974. 

Jacobs,  C.M.,  and  Jacobs,  N.D.  The  PEP 
Primer:  The  JCAH  Performance  Evaluation 
Procedure  for  Auditing  and  Improving  Phy- 
sician Care.  Chicago,  111.  Quality  Review 
Center,  Joint  Commission  on  Accreditation 
of  Hospitals,  1974. 

Kelman,  S.  Improving  Doctor  Performance:  A 
Study  of  the  Use  of  Information  and  Organi- 
zational Change.  New  York:  Center  for  Pol- 
icy Alternatives,  1976. 

Kerlinger,  N.  Foundations  of  Behavioral  Re- 
search. New  York:  Holt,  Rinehart,  and  Win- 
ston, 1965. 

Kessner,  D.M.;  Kalk,  C.E.;  and  Singer,  J.  As- 
sessing health  quality-  the  ceise  for  tracers. 
New  England  Journal  of  Medicine  288(4): 
189-194,  1973. 

Kisch,  A.I.,  and  Reeder,  L.G.  Client  evaluation 
of  physician  performance.  Journal  of  Health 
and  Social  Behavior  10(l):51-58,  1969. 

Labarthe,  D.R.;  Hawkins,  C.M.;  and  Reming- 
ton, R.D.  Evaluation  of  performance  of 
selected  devices  for  measuring  blood 
pressure.  American  Journal  of  Cardiology 
32(Sept.  20):  546-553,  1973. 

Lindsay,  M.I.;  Hermans,  P.E.;  Nobrega,  F.T.; 
et  al.  Quality  of  care  assessment.  I.  Out- 
patient management  of  acute  bacterial 
cystitis  as  a model.  Mayo  Clinic  Proceed- 
ings 51(May):  307-312,  1976. 

Linn,  B.S.;  Linn,  M.W.;  Greenwald,  S.R.;  et  al. 
Validity  of  impairment  ratings  made  from 
medical  records  and  from  personal  knowl- 


35 


edge.  Medical  Care  12(4):363-386,  1974. 

Lyons,  T.F.,  and  Payne,  B.C.  The  relationships 
of  physicians’  medical  recording  perform- 
ances to  their  medical  care  performance. 
Medical  Care  12(5):463-469,  1974. 

Mzu:^ton,  M.  Complicince  with  medical  regi- 
mens: A review  of  the  literature.  Nursing 
Research  19(4):312-323,  1970. 

Martini,  C.J.M.;  Allan,  G.J.B.;  Davison,  J.;  et 
al.  Health  indexes  sensitive  to  medical  czire 
variation.  International  Journal  of  Health 
Services  7(2):293-309,  1977. 

Mciskell,  R.M.,  and  Pe2id,  L.J.  Urinary  in- 
fection in  children  in  general  practice:  A 
laboratory  view.  Journal  of  Hygiene  77: 
291-298,  1976. 

McAuliffe,  W.E.  On  the  statistical  validity  of 
standards  used  in  the  profile  monitoring  of 
health  care.  American  Journal  of  Public 
Health  68(7):645-651,  1978a. 

McAuliffe,  W.E.  Studies  of  process-outcome 
correlations  in  medical  care  evaluations:  A 
critique.  Medical  Care  16(ll):907-930, 
1978b. 

McClure,  W.  Four  points  on  quality  assurance. 
In:  Regional  Medical  Programs  Service,  ed. 
Quality  Assurance  of  Medical  Care.  Wash- 
ington, D.C.:  Health  Services  and  Mental 
Health  Administration,  DHEW,  1973. 

Moses,  L.E.,  and  Mosteller,  F.  Institutional 
differences  in  postoperative  death  rates: 
Commentary  on  some  of  the  findings  of  the 
National  Halothane  Study.  Journal  of  the 
American  Medical  Association  203(Feb  12): 
150-152,  1968. 

Nunnally,  J.C.  Psychometric  Theory,  rev.  ed. 
New  York:  McGraw-Hill,  1978. 

Osborne,  C.E.,  and  Thompson,  H.C.  Criteria 
for  evaluation  of  ambiilatory  child  health 
care  by  chart  axodit:  Development  and 
testing  of  a methodology.  Pediatrics  56 
(Supplement):  625-692,  1975. 

Palmer,  R.H.  Quality  assessment.  In:  Green, 
R.,  ed.  Assuring  Quality  in  Medical  Care. 
Cambridge,  Mass.:  Ballinger,  1976.  pp.  11- 
136. 

Peterson,  O.L.  An  analytical  study  of  North 
Carolina  general  practice:  1953-54.  Journal 
of  Medical  Education  31(pt.  2,  Dec.):  1-165, 
1956. 

Richardson,  F.M.  Metho<’ological  development 
of  a system  of  medical  audit.  Medical  Care 
10(6):451-462,  1972. 

Roemer,  M.I.;  Moustafa,  A.T.;  emd  Hopkins,  C. 
E.  A proposed  hospital  quality  index: 
Hospital  death  rates  adjusted  for  czire 
severity.  Health  Services  Research  3(1): 


96-118,  1968. 

Romm,  F.J.;  Hulka,  B.S.;  and  Mayo,  F.  Corre- 
lates of  outcomes  in  patients  with  conges- 
tive heart  failure.  Medical  Care  14(9): 
765-776,  1976. 

Roos,  N.P.;  Henteleff,  P.D.;  and  Roos,  L.L.  A 
new  audit  procedure  applied  to  an  old  ques- 
tion: Is  the  frequency  of  T & A justified? 
Medical  Care  15(1):  1-18,  1977. 

Rosenberg,  E.W.  Medical  audit  JCAH-style:  A 
negative  view.  Journal  of  the  American 
Medical  Association  237  (18)  : 1935-1937, 
1977. 

Rosenfeld,  L.S.  Quality  of  medical  care  in 
hospitals.  American  Journal  of  Public 
Health  47(7):856-865,  1957. 

Rutstein,  D.D.;  Berenberg,  W.;  Chalmers, 
T.C.;  et  al.  Measuring  the  quality  of  medical 
care:  A clinical  method.  New  England 

Journal  of  Medicine  294(ll):582-588,  1976. 

Schroeder,  S.A.,  and  Donaldson,  M.  The  feasi- 
bility of  an  outcome  approach  to  quality  as- 
surance—A report  from  one  HMO.  Medical 
Care  14(l):49-56,  1976. 

Scott,  W.R.;  Forrest,  W.H.;  and  Brown,  B.W. 
Hospital  structure  and  postoperative  mor- 
tality zmd  morbidity.  In:  Shortell,  S.M.,  and 
Brown,  M.,  eds.  Organizational  Research  in 
Hospitals.  Chicago,  111.:  Inquiry  Book,  Blue 
Cross  Association,  1976.  pp.  72-89. 

Selltiz,  C.;  Jahoda,  M.;  Deutsch,  M.;  et  al.  Re- 
search Methods  in  Social  Relations,  rev.  ed. 
New  York:  Holt,  Rinehart,  and  Winston, 
1963. 

Shortell,  S.E.;  Becker,  S.W.;  and  Neuhauser,  D. 
The  effects  of  management  practices  on 
hospital  efficiency  zind  quality  of  care.  In: 
Shortell,  S.M.,  and  Brown,  M.,  eds.  Organi- 
zational Research  in  Hospitals.  Chicago,  Ul.: 
Inquiry  Book,  Blue  Cross  Association,  1976. 

Starfield,  B.  Measurement  of  outcome:  A pro- 
posed scheme.  Milbank  Memorial  Fund 
Quarterly  52(Winter):39-50,  1974. 

Thompson,  H.C.,  and  Osborne,  C.E.  Develop- 
ment of  criteria  for  quality  assurance  of 
ambulatory  child  health  care.  Medical  Care 
12  (10):807-827,  1974. 

Wilson,  J.T.  Compliance  with  instructions  in 
the  evaluation  of  therapeutic  efficacy:  A 
common  but  frequently  unrecognized  major 
variable.  Clinical  Pediatrics  12(6):333-340, 
1973. 

Zuckerman,  A.E.;  Starfield,  B.;  Hochreiter,  C.; 
et  al.  Validating  the  content  of  pediatric 
outpatient  medical  records  by  means  of 
taperecording  doctor-patient  encounters. 
Pediatrics  56(3):407-411,  1966. 


36 


The  Status  of  Productivity  Measurement  in  the  Public  Sector — an  Update^ 

Harry  P.  Hatry 

The  Urban  Institute 
Washington,  D.C. 


Unless  you  are  keeping  score,  it  is  difficult 
to  know  whether  you  are  winning  or  losing. 
This  applies  to  ball  games,  card  games,  and  no 
less  to  government  productivity  for  specific 
services  and  activities.  Productivity  measure- 
ments permit  governments  to  identify  problem 
areas  and,  as  corrective  actions  are  taken,  to 
detect  the  extent  to  which  improvements  have 
occurred. 

This  status  report  first  defines  productivity 
measurement,  then  describes  major  current 
problems  in  measurement  and  presents  a 
viewpoint  on  the  status  of  government  pro- 
ductivity measurement,  and,  finally,  briefly 
examines  the  likely  prospects  for  the  future, 
including  consideration  of  facilitating  and  in- 
hibiting factors. 


What  Is  Productivity  Measurement? 

Productivity  is  defined  most  often  as  the 
ratio  of  output  to  input  for  a particular  ac- 
tivity. To  apply  that  definition  to  any  par- 
ticular government  service,  however,  is  a 
complex  task  that  is  subject  to  controversy. 

Productivity  measurement  generally  has 
been  defined  in  the  public  sector  as  encom- 
passing both  efficiency  and  effectiveness.  Ef- 
ficiency indicates  the  extent  to  which  the 
government  produces  a given  output  with  the 
leeist  possible  use  of  resources.  Effectiveness 
indicates  the  amount  of  end  product,  the  real 
sendee  to  the  public,  that  the  government  is 
providing.  The  U.S.  Office  of  Personnel  Man- 
agement, the  agency  that  had  responsibility 
for  measiiring  Federal  productivity  until  1982, 
defined  productivity  "...  as  the  sum  of  the 

^This  paper  is  an  updated  eind  somewhat  ex- 
panded version  of  "The  Status  of  Productivity 
Measurement  in  the  Public  Sector,"  an  article  ap- 
pearing in  Public  Administration  Review,  January/ 
February  1978. 


efficiency,  effectiveness,  quality,  and  re- 
sponsiveness with  which  products  and  services 
are  delivered"  (USOPM  1980,  p.6). 

Measures  of  productivity/efficiency  can 
take  various  forms,  including  the  following: 

a.  The  ratio  of  number  of  units  of  work 
accomplished  per  unit  of  input.  This  is 
the  cleissic  productivity  measurement. 
The  output  is  expressed  in  work  units, 
such  as  tons  of  garbage  collected,  num- 
ber of  arrests,  square  yards  of  street 
patched,  gallons  of  water  treated,  etc. 
The  input  units  can  be  expressed  as  the 
number  of  employee  hours  allocated  to 
that  activity  (to  measure  labor  produc- 
tivity) or  in  dollars,  adjusted  for  infla- 
tion over  the  relevant  period  of  time  (to 
act  as  a substitute  for  all  resources  ap- 
plied—multifactor  productivity).  The 
ratios  can  be  expressed  as  output  divided 
by  input  (productivity)  or  input  divided 
by  output  (efficiency). 

b.  Ratio  measures  that  consider  the  quality 
of  the  output.  Examples  include  "number 
of  clients  improved  per  employee  hour" 
(rather  than  "number  of  clients  treated 
per  employee  hour")  and  "number  of  ar- 
rests  that  survive  the  initial  judicial 
screening  per  police  officer."  These 
measures  currently  are  rarely  used  by 
governments,  in  part  because  of  the  lack 
of  precedent  and  the  need  to  revise  data 
collection  procedures  to  obtain  such 
meeisures.  But  this  form  seems  far  more 
valid  than  (a). 

c.  UtUization-availability  measures.  Some- 
times used  for  some  government  activ- 
ities are  measures  of  downtime  for  ve- 
hicles and  equipment  and  productive 
hours  for  personnel.  In  the  latter  case. 


37 


employee  time  is  logged  to  distingviish 
the  amount  of  time  spent  on  activities 
defined  as  productive  as  opposed  to 
other  activities  defined  as  nonproductive 
(a  maintenance  crew  waiting  for  mate- 
rials to  work  with  or  a police  officer 
waiting  for  a patrol  car  to  become 
available).  These  measures  are  proxy 
measures  of  efficiency.  Shifts  in  down- 
time or  productive  time  will  not  neces- 
sarily result  in  improvements  in  output 
or  costs.  Such  improvements  will  occur 
only  if  additional  output  results  from  the 
extra  available  time  or  if  personnel  are 
reduced.  (For  example,  added  time  re- 
leaised  for  police  officers  to  permit  them 
to  patrol  streets  leeids  to  increaised  pro- 
ductivity only  if  that  extra  patrol  time 
leads  to  fewer  crimes  or  more  successful 
airrests  or  some  other  delivery  of 
service.) 

d.  Productivity  indices.  These  are  lased  to 
measure  the  change  from  one  year  to  the 
next  of  relative  rather  than  absolute 
productivity.  Productivity  indices  can  be 
constructed  for  any  of  the  types  of 
measures  described  thus  far,  but  to  date 
they  have  been  constructed  primarily  for 
ratios  of  output  to  input.  The  level  of 
productivity  in  a base  yezir  (or  baise 
period  of  perhaps  2 to  3 years)  is  given 
the  value  of  100,  and  performance  in 
future  years  is  then  expressed  as  a 
percentage  of  the  performance  in  the 
base  period  times  100.  An  overall  index 
of  the  productivity  of  all  government 
activities  can  be  computed  as  a weighted 
sum  of  the  indices  for  different 
activities  (such  as  by  weighting  each 
activity  by  the  number  of  employee 
hours  it  used  in  the  base  year).  (See  U.S. 
Joint  Financial  Management  Improve- 
ment Program  1976,  appendix  F.)  For  a 
statistical  analysis  approach  that  si- 
multaneously relates  outputs  to  inputs, 
see  Ross  and  Burkheaid  (1974)  and 
Neuman  1976). 

Current  Problems  in  Measuring  Productivity 
of  Government  Services^ 

A number  of  difficult  problems  arise  in 
measuring  productivity  of  Federal,  State,  and 
^This  section  is  adapted  from  the  American  As- 
sociation for  the  Adveincement  of  Science  (1978) 
and  Hatry  et  zd.  (1979). 


local  government  agencies. 

The  outputs  of  government  services  are  of- 
ten difficult  to  define  and  even  when  defined 
are  difficult  to  measure.  Those  output  meas- 
ures that  are  readUy  available  often  do  not 
zidequately  reflect  the  real  purposes  of  the 
activities  they  purport  to  measure.  Unlike  the 
private  sector,  most  government  agencies 
provide  services  eind  not  goods.  Higher  values 
for  productivity  meaisures  using  these  readily 
available  outputs  do  not  necessarily  meam 
higher  real  government  productivity.  For  ex- 
ample, the  Postal  Service  found  substantial 
improvements  in  the  number  of  items  handled 
per  employee  but  at  the  same  time  was  re- 
porting substantial  increases  in  mail  delivery 
times.  Is  this  increased  government 

productivity? 

Such  output  indicators  as  the  number  of  tons 
of  paving  materials  used  or  the  number  of 
potholes  patched  are  useful  measures  of  work 
accomplishment,  but  they  indicate  little  about 
the  purpose  of  the  service,  i.e.,  making  streets 
safe  and  rideable.  Similarly,  the  number  of 
clients  treated  in  a health  or  rehabilitation 
program  does  not  indicate  how  many  people 
are  actually  helped.  When  an  agency  provides 
these  intermediate  outputs  at  lower  cost,  this 
means  only  that  the  potential  for  measuring 
productivity  hzis  improved.  Unless  there  is  a 
direct  link  between  the  intermediate  output 
and  the  final  output,  this  potential  will  not  be 
realized  (National  Research  Council  1979,  p. 
79). 

Counts  of  physical  output  in  government 
agencies  seldom  consider  the  quality  of  those 
phsrsical  units  of  output  or  the  level  of  service 
provided.  Quality  can  vziry  considerably  from 
one  unit  to  another.  What  may,  on  the  surface, 
appear  to  be  a homogeneous  product  is  likely 
to  be  multiple  products  with  a variety  of  lev- 
els of  quality.  This  is  particularly  a problem 
when  productivity  measurement  begins  to  be 
used  by  government  agencies  for  such  major 
purposes  as  making  budget  decisions  or  for 
assessing  employees  as  part  of  performance 
evaluations.  Employees  then  will  be  tempted 
to  increeise  the  quantity  of  output  at  the  ex- 
pense of  its  quality.  An  increase  in  the  output  I 
per  employee  hour  achieved  at  the  expense  of 
a reduction  in  the  quality  of  the  output  is  not 
a true  productivity  improvement. 

The  potential  perversities  of  the  situation 
are  illustrated  by  an  example  from  street 
maintenance.  If  a maintenance  crew  does  a 
poor  job  patching  a pothole,  it  might  be  called 
back  a few  weeks  later  to  make  a second 
patch  aifter  someone  spotted  the  problem  in 


38 


the  road.  In  most  local  governments,  two  units 
of  output  would  be  counted,  despite  the  fact 
that  one  was  defective. 

Similarly,  changes  in  the  level  of  service 
provided  can  reduce  or  increzise  the  amount  of 
output  per  unit  of  input.  For  example,  if  gar- 
bage collection  is  changed  from  backdoor 
collection  to  curbside  collection,  the  unit  cost 
will  go  down  at  least  one-third.  The  reduction 
in  unit  cost  has  been  achieved  by  reducing  the 
level  of  service,  not  really  by  increeising  the 
productivity  of  the  work  force. 

Government  agencies  need  to  develop 
quality  standards  and  procedures  for  system- 
atic assessments  of  the  quality  of  output  (even 
if  only  on  a sampling  basis).  Two  ways  to 
incorporate  information  on  the  quality  of 
output  for  productivity  mezisurement  are: 

1.  Exclude  from  the  count  of  output  that 
output  that  does  not  peiss  tests  of  qual- 
ity. For  example,  in  zissessing  police 
productivity  in  apprehending  criminals, 
instead  of  counting  all  arrests  in  the 
mejisure  "arrests  per  police  employee," 
count  only  those  arrests  that  survive  the 
initial  judicial  screening.  Similarly,  ex- 
clude potholes  that  have  to  be  re- 
repaired within  perhaps  3 months.  When 
assessing  the  productivity  of  activity  to 
place  children  in  group  residential  care, 
include  in  the  output  count  only  those 
placements  made  within  a designated 
number  of  days  and  subsequently  judged 
appropriate. 

2.  At  the  very  least,  include  separate 
measures  reflecting  the  quality/ef- 
fectiveness of  the  activity.  Users  then 
can  see  whether  changes  in  quality,  and 
not  productivity,  might  explain  changes 
in  unit  costs. 

The  difficulty  of  the  incoming  workload  can 
differ  considerably.  Productivity  meeisure- 
ments  as  usually  calculated  may  actually  re- 
flect differences  in  the  character  of  the 
workload  and  not  productivity  changes.  The 
time  and  effort  to  achieve  a specific  level  of 
output  can  be  greatly  affected  by  the  diffi- 
culty of  the  incoming  workload.  For  example, 
some  license  applications,  some  welfare  ap- 
plications, some  road  repairs  will  be  substan- 
tially eeisier  to  handle  than  others.  In  group 
residential  care  for  children,  finding  appro- 
priate placements  for  emotionally  disturbed 
adolescent  children  likely  will  require  much 
larger  expenditures  of  time  and  resources  than 


finding  placements  for  younger,  better  ad- 
justed children.  It  generally  will  be  more  dif- 
ficult to  identify  and  apprehend  offenders  for 
cases  in  which  there  were  no  witnesses  than 
for  ceises  in  which  witnesses  were  present. 
Providing  water  at  given  quality  levels  wUl 
probably  be  significantly  more  costly  for  wa- 
ter treatment  plants  where  the  quality  of  the 
incoming  water  is  significantly  lower  than  the 
quality  of  water  at  other  locations. 

The  mix  of  worklozid  may  change  over  time 
or  be  different  for  different  facilities.  For 
example,  street  maintenance  will  be  most 
costly  in  those  areas  and  those  time  periods 
with  adverse  weather  conditions.  Thus,  ap- 
parent differences  in  the  productivity  indi- 
cators actiially  may  be  due  to  a difference  in 


the  mix  of  the  incoming  workload 
represent  a change  in  productivity, 
thetical  example  illustrates  the 
(Hatry  et  al.  1979,  pp.  8-9): 

and  not 
A hypo- 
problem 

1982 

1983 

Number  of  repairs 
Overall  cost  per  repair 

500 

$9.60 

500 

$11.20 

Total  cost 

$4,800 

$5,600 

Unit  cost  hats  increased  by  17  percent.  It  ap- 
pears that  efficiency  has  declined.  When  the 
repairs  are  broken  down  by  type  of  incoming 
work,  however,  a different  picture  emerges: 


1982  1983 


No. 

Average 
cost 
per  unit 

No. 

Average 
cost 
per  unit 

Low-difficulty 

repairs 

400 

$8.00 

200 

$7.00 

High-difficulty 

repairs 

100 

$16.00 

300 

$14.00 

Total  cost 

$4,800 

$5,600 

In  1983,  the  agency  actually  had  lower  unit 
costs  for  both  types  of  repairs.  Efficiency  had 
increeised!  What  happened  is  that  the  incoming 
mix  of  workload  changed  significantly,  with  a 
much  greater  proportion  of  more  difficult 
work  occurring  in  1983. 

Thus,  it  seems  highly  desirable  to  develop 
procedures  to  classify  the  incoming  workload 
by  difficulty  level  and  to  zissess  outcomes  for 
each  such  group.  But  this  is  much  ezisier  said 
than  done. 


39 


Two  ways  to  incorporate  information  on  the 
difficulty  of  the  incoming  workload  are  the 
following: 

1.  Collect  data  so  that  the  cost  per  unit 
can  be  calculated  for  the  work  accom- 
plished in  each  category  of  difficulty. 
With  such  information,  government  of- 
ficials and  the  public  can  assess  the  ef- 
ficiency of  a government  agency  in 
treating  each  difficulty  level.  The  gov- 
ernment can  then  directly  identify 
changes  in  aggregate  efficiency  that  in 
actuality  were  due  to  changes  in  the 
difficulty  of  the  incoming  workload. 

2.  At  the  very  least,  classify  the  incoming 
workloaui  into  three  to  five  categories  of 
difficulty,  and  identify  the  proportion  of 
the  workloaid  that  falls  into  each  cate- 
gory. Users  then  can  see  whether  dif- 
ferent workload  composition  might  ex- 
plain differences  in  aggregate  unit  costs. 

Government  officials  seldom  consider 
workload  difficulty.  Productivity  comparisons 
between  time  periods  or  between  government 
facilities  thus  will  be  made  much  less  useful  to 
government  managers  «ind  actually  may  be 
misleading  and  unfair. 

Complicating  the  problem  is  the  fact  that 
there  are  usually  multiple  objectives  and  thus 
multiple  dimensions  of  quality/effectiveness 
for  any  service,  e.g.,  police  agencies  are  si- 
multaneously concerned  with  deterring  crime, 
solving  crimes  and  apprehending  those  com- 
mitting them,  and  controlling  traffic. 

The  measurement  of  the  amount  of  input  is 
often  difficult.  For  meeisuring  labor  produc- 
tivity (output  per  employee  hour),  whose  hours 
should  be  included- -only  the  first-line,  direct 
employees?  What  about  the  first-line  super- 
visors and  other  indirect,  support  personnel 
(e.g.,  data  processing  staff)  whose  time  also 
contributes  to  performance?  For  measuring 
multifactor  productivity  (e.g.,  output  per 
dollar),  which  dollars  should  be  included— only 
direct  personnel  zind  cost?  What  about  the  cost 
of  indirect  support  personnel  and  related 
equipment  and  facilities?  Should  capital  costs 
be  included?  How?  By  and  large,  few  of  these 
questions  have  been  seriously  considered  or 
researched  for  public  sector  productivity 
measurement.  Deficiencies  in  cost- accounting 
systems  at  all  levels  of  government  further 
complicate  the  problem.  Often,  data  on  the 
amount  of  labor  or  dollars  eissociated  with 
particular  activities  whose  productivity  is  to 


be  measured  are  not  available.  In  such  in- 
stsmces,  semiartificial  cost- allocation  for- 
mulais  or  estimates  have  to  be  resorted  to. 
(See  Hatry  et  al.  1979,  chapter  6,  for  a more 
detailed  discussion  of  cost  estimation.) 

Current  Status  of  Productivity  Measurement 

The  current  major  ongoing  effort  at  classic 
prodiactivity  meaisurement  is  that  of  the  Fed- 
erad  Government,  which  annually  aissesses  the 
productivity  of  the  Federal  work  force  using 
the  traditional  ratios  of  output  to  input.  This 
will  be  discussed  is  a later  section. 

A 1976  Urban  Institute  examination  of  the 
budget  documents  of  245  cities  and  counties 
found  that  only  a small  proportion  contained 
either  efficiency  or  effectiveness  measures  to 
any  extent.  Only  25  percent  displayed  at  least 
one  effectiveness  measure;  10  percent  listed 
at  least  one  efficiency  measure.  Only  a very 
small  number  of  those  jurisdictions  presenting 
measures  went  beyond  such  familiar  ones  as 
the  number  of  crimes,  number  of  fires,  number 
of  traffic  accidents,  and  number  of  illnesses 
(which  have  been  collected  largely  because  of 
strong  Federal  impetus). ^ Of  course,  many 
local  governments  do  not  put  all  their  per- 
formance measurement  information  into  ^eir 
budget  documents.  Local  governments,  how- 
ever, appear  to  be  increasing  their  use  of 
performance  mezisurement,  even  though  the 
regular,  formal  use  of  productivity  measure- 
ments still  is  not  widespread.  A Pennsylvania 
State  University  survey  in  1982  of  456  local 
governments  with  over  25,000  population 
found  that  26  percent  reported  city-wide  use 
of  "performance  monitoring"  aind  an  additional 
42  percent  reported  use  in  "selected  areas" 
(Poister  and  McGowan  1983),  This  finding  of 
68  percent  reporting  use  wais  double  that  of 
the  under  30  percent  reporting  use  in  a similar 
1976  survey  by  the  International  City  Manage- 
ment Association  (Fukuhau*a  1977). 

At  the  State  government  level.  The  Urban 
Institute  and  the  National  Association  of  State 
Budget  Officers  (1975)  conducted  a mail  sur- 
vey to  obtain  perceptions  of  State  budget  of- 
fices as  to  the  adequacy  of  existing  efficiency 
and  effectiveness  measures.  Results  of  the 
responses  from  32  States  are  presented  in  ex- 
hibit 1.  They  can  be  summarized  as  follows: 

3The  U.S.  Bureau  of  the  Census  annually  provides 
calculations  of  per-capita  costs  for  selected 
expenditure  categories  for  individual  State  and 
local  governments,  but  population  counts  should 
not  be  confused  with  the  output  counts  needed  to 
measure  productivity. 


40 


Exhibit  1.  How  32  State  budget  offices  rate  the  adequacy  of  available 
efficiency  and  effectiveness  measurements 

Economic  and  Physical  and 

manpower  development  Corrections Transportation  mental  health 

Efficiency  Effectiveness  Efficiency  Effectiveness  Efficiency  Effectiveness  Efficiency  Effectiveness 
measures  measures  measures  measures  measures  measures  measures  measures 


Excellent 

1 

1 

1 

— 

1 

— 

— 

— 

Good 

8 

5 

4 

2 

14 

7 

8 

4 

Fair 

7 

9 

13 

14 

9 

9 

16 

12 

Poor 

10 

11 

9 

11 

3 

10 

1 

10 

No  opinion 

_6 

_6 

_5 

_5 

_6 

_6 

Licensing  and 

Public  assistance 

Other  social  services 

Parks  and  recreation 

regulation 

Efficiency 

Effectiveness 

Efficiency  Effectiveness  Efficiency 

Effectiveness  Efficiency 

Effectiveness 

measures 

measures 

measures 

measures 

measures 

measures 

measures 

measures 

Excellent 

1 

— 

— 

— 

1 

1 

1 

— 

Good 

11 

8 

4 

2 

10 

4 

6 

4 

Fair 

7 

10 

9 

10 

10 

9 

7 

5 

Poor 

9 

8 

15 

14 

6 

11 

11 

14 

No  opinion 

_6 

J- 

_6 

_7 

_9 

All  State  Programs 

Efficiency  Effectiveness 
measures  measures 


Quite  adequate 

3 

Adequate 

7 

Barely  adequate 

7 

1 1 

Inadequate 

8 

18 

No  rating  given 

1_ 

32 

32 

Source:  The  Urban  Institute,  The  Status  of  Productivity  Measurement  in  State  Government:  An  Initial  Examination 
(September  1975). 


• Of  the  32  States  responding,  15  or  47 
percent  rated  existing  efficiency  meas- 
ures as  only  barely  adequate  or  inade- 
quate; only  10  States,  31  percent,  rated 
their  mezisures  as  adequate  or  quite 
adequate. 

• Of  the  32  States  responding,  29  or  91 
percent  rated  current  effectiveness 
measures  as  barely  adequate  or  inade- 
quate; none  rated  these  meaisures  as  ade- 


quate or  quite  adequate  (the  other  3 had 
no  opinion). 


An  examination  of  the  State  budgets  and 
other  public  documents  found  very  few  States 
had  a significant  number  of  efficiency  and 
effectiveness  mezisures  (The  Urban  Institute 
eind  NASBO  1975).  As  with  loczil  governments, 
a very  small  number  of  States  accounted  for  a 
Isirge  percentage  of  the  mezisures. 


41 


Emerging  Measurement  Methods 

Despite  the  paucity  of  productivity  mezis- 
urement,  four  potentially  significant  devel- 
opments are  emerging: 

1.  Renewed  efforts  at  efficiency  mejisure- 
ment  involving  the  calculation  of  ratios 
of  work  accomplished  to  inputs,  with  the 
latter  defined  in  terms  of  numbers  of 
employee  hours  or  dollars 

2.  Greatly  increased  use  of  engineered 
work  standards 

3.  Development  of  effectiveness  meeisure- 
ment  procedures  along  two  lines,  one 
using  citizens  or  client  ratings  of  various 
service  characteristics  and  the  other 
using  systematic  data  collection  proce- 
dures, such  as  predeveloped  rating  scales 
and  trained  observer  procedures,  to 
systematize  ratings  of  service  charac- 
teristics (e.g.,  street  cleanliness  and 
park  conditions) 

4.  Increaised  interest  in  comprehensive 
measurement  systems,  possibly  using 
more  than  one  or  even  all  of  the  fore- 
going approaches 

Each  will  be  discussed  in  turn. 


Measuring  Output/Input  Ratios 

The  use  of  meaisures  expressed  as  ratios  of 
the  amount  of  work  accomplished  to  the 
amount  of  employee  hours  (or  dollars)  goes 
back  decades  (including  the  enthusiasm  for 
performance  budgeting  in  the  1950s).  Their  use 
has  been  relatively  rare  in  recent  years, 
however. 

The  foremost  example  of  current  use  of 
these  indicators  is  provided  by  the  Federal 
Government,  which  since  fiscal  year  1972  has 
annually  collected  such  measures  for  many 
Federal  activities.  The  1982  Federal  effort 
included  3,427  output  indicators  from  47 
agencies.  A total  of  1.7  million  employee 
years  of  Federal  civilian  employment,  rep- 
resenting 62  percent  of  total  civilian  employ- 
ment, was  covered  (USBLS  1983).  Exhibit  2 
displays  a sample  of  the  output  indicators 
provided  by  the  agencies.  The  Bureau  of  Labor 
Statistics  uses  these  output  data,  together 
vdth  other  agency-provided  data  on  the  num- 
ber of  employee  years,  to  compute  produc- 


Ejdiibit 2.  Sample  Federal  output  indicators 


Function 

Sample  output 
indicator 

Citizens'  records 

Claims  processed 

Reference  services 

Reports  issued 

Transportation 

Millions  of  long-tons 
shipped 

Power 

Kilowatt  hours  sold 

Medical  services 

Outpatient  visits 

Education  and  training 

Student  years  trained 

Agriculture  and 

Planning  and 

natural  resources 

application 
services  provided 

Library  services 

Items  loan^ 

Military  base  services 

Millions  of  meals 
served 

Internal  audit 

Investigations 

completed 

Regulation: 

Rulemaking  and 
licensing 

Applications  approved 

Source:  Extracted  from  the  Federad  Produc- 

tivity Measurement  Data  Baise,  Fiscad  Yeau^  1977- 
1978,  Office  of  Productivity  Prograuns,  U.S.  Office 
of  Personnel  Mamagement,  December  17,  1979.  One 
representative  indicator  hais  been  selected  from  11 
of  a total  of  28  functionad  categories. 


tivity  indices.  The  focus  is  entirely  on  labor 
productivity—  that  is,  output  related  to  the 
number  of  employee  years  (and  not  to  dollau^ 
expended). 

Based  on  these  data,  the  U.S.  Bureau  of  La- 
bor Statistics  (1983,  p.  1)  reported  that  pro- 
ductivity for  the  measured  portion  of  the 
Federal  work  force  increased  steaidily  over  the 
years  1967  through  1982,  averaging  a 1.5 
percent  increaise  per  yeair.  There  were  sub- 
stantial variations  in  ^e  rates  of  change  for 
the  28  individual  functional  groupings  for  which 
productivity  indices  were  developed.  These 
range  from  -1.5  percent  (electric  power 
production  and  distribution)  to  +11.8  percent 
(communications)  (USBLS  1983,  p.  7).  The  re- 
sults are  not  publicly  reported  for  individual 
agencies  but  only  by  functional  categories.  A 
full  list  of  the  28  functional  categories  is  shown 
in  exhibit  3. 

It  is  important  to  note  that  the  output 
meaisures  illustrated  in  exhibit  2 do  not  reflect 
the  final  products  of  Federal  services.  As  the 
earlier  discussion  of  workload  outputs  stressed, 
these  measures  by  no  means  reflect  effec- 


42 


Exhibit  3.  Functional  groupings  used 
by  the  Federal  Government 


Audit  of  operations 
Buildings  and  grounds  maintenance 
Communications 
Education  and  training 
Electric  power  production  and  distribution 
Equipment  maintenance 
Finance  and  accounting 
General  support  services 
Information  services 
Legal  and  judicial  activities 
Library  services 
Loans  and  grants 
Medical  services 
Military  base  services 
Natural  resources  and  environment 
management 
Personnel  investigations 
Personnel  management 
Postal  Service 
Printing  and  duplication 
Procurement 
Records  management 
Regulation-Compliance  and  enforcement 
Regulation-Rulemaking  and  licensing 
Social  services  and  benefits 
Specialized  manufacturing 
Supply  and  inventory  control 
Traffic  management 
Transportation 


Source:  USOPM  1980,  p.  25. 

tiveness  in  delivering  quality  health  care,  edu- 
cation, defense,  economic  well  being,  and  the 
like.  Also,  it  is  not  clear  whether  agencies 
providing  the  output  data  consider  the  quality 
of  these  products  or  define  these  products 
consistently. 

It  is  not  clear  whether  Congress  or  the 
agencies  have  used  these  productivity  data.  In 
addition  to  the  question  of  meaningfulness  of 
such  data,  it  also  appears  that  the  data  pre- 
sented publicly  are  too  aggregative;  they  are 
presented  for  each  of  28  functions  but  not  for 
specific  agencies  or  activities.  The  Bureau  of 
Labor  Statistics,  however,  does  provide  out- 
put-input ratios  and  productivity  indices  on 
each  indicator  to  the  individual  agencies  for 
their  internal  zinalysis  and  use.  (See  Mark  1979 
for  a discussion  of  the  Federal  effort  and  its 
problems.) 

Despite  these  failings,  the  Federal  Govern- 
ment has  made  a reasonable  beginning.  It  Cein 
be  argued  that  even  such  limited  data  can 
provide  government  managers  with  informa- 


tion for  spurring  productivity  improvements. 

The  local  government  level  hzis  witnessed  a 
growing  number  of  attempts  at  both  efficiency 
and  effectiveness  measurement.  But  few  local 
governments  currently  calculate  productivity 
indices  based  on  ratios  of  amount  of  work  ac- 
complished to  input. 

The  city  of  Milwaukee  has  one  of  the  most 
extensive  s5rstems  of  unit  cost  measures,  baised 
on  work  activity  measures,  which  have  been 
collected  for  several  years  by  the  budget  of- 
fice. The  city  has  questioned  their  usefulness, 
however,  and  currently  is  moving  to  fewer  but 
more  selective  meaisurements. 

At  the  State  level,  the  most  common  form 
of  unit  cost  measure  is  found  in  certain  ac- 
tivities for  institutionalized  clients,  e.g.,  cost 
of  food  per  day  per  inmate  of  correctional 
facilities.  A principal  problem  in  the  past  hzis 
been  the  lack  of  use  of  such  measurements  for 
management  purposes.  Governments  use  unit 
cost  measures  primarily  for  budgetary  prep- 
aration and  justification;  officials  use  such 
measures  much  less  to  help  improve  perform- 
ance. One  major  exception  relates  to  the  use 
of  work  standards.  This  is  discussed  in  the  next 
section. 


Use  of  Work  Standards 

Local,  State,  and  Federal  Governments  have 
rediscovered  the  industrial  engineer.  In  recent 
years,  a large  increase  has  occurred  in  the 
number  of  government  agencies  using  work 
standards. 

With  work  standards,  individual  work  ac- 
tivities are  examined  systematically  to  de- 
termine the  amount  of  time  that  the  activity 
should  require— the  standards.  Subsequently, 
workers  report  on  the  actual  times  required  to 
produce  the  output.  The  actual  times  then  can 
be  compared  with  the  standard  times  to  in- 
dicate the  efficiency  of  the  work  force. 
Standard  times  are  expressed  in  the  form  of 
unit  cost  measures,  the  amount  of  time  it 
should  take  to  provide  a specific  product. 
Work  standards  are  employed  primarily  for 
relatively  routine,  repetitive  operations,  such 
as  clerical  activities  and  street  maintenance 
(regarding  local  government  use,  see  ICMA 
1974;  National  Commission  on  Productivity 
and  Work  Quality  1975;  Public  Technology 
1977;  regarding  Federal  work  standards,  see 
U.S.  Army  1973;  USDOD  1977).  Exhibit  4 lists 
a number  of  applications  in  local  government. 
Recently,  government  agencies  have  begun  to 
apply  work  standards  to  less  routine  work  such 


43 


Exhibit  4.  Examples  of  applications  of  work  standards  in  Phoenix,  Arizona 


Police 

Communications  Bureau 
Clerical  positions 
Police  service/information 
center 

Police  telephones 
Radio  dispatchers 
Information  Bureau  (records) 
Clerical  positions 

Water 

Water  distribution 
Meter  repair  shop 
Field  repair/service  crews 
Water  production 

Treatment  plant  operators 
Well  sites/pump  stations 
Water  accounting 
Service  orders 
Meter  reading 

Sewers 

Sewer  field  repair/service  crews 
Treatment  plant  operators 


Library 

Clerical  positions 
Overdue  book  notifications 
Circulation  desk 
Orders  and  processing 
Cataloging-clerical  positions 
Book  mending 

Street  maintenance 

Preventive  maintenance 
(sealing,  etc.) 

General  maintenance 
(patch  crews,  etc.) 

Traffic  engineering 
Sign  maintenance 
Paint  striping 

Computer  services 
Keypunch  operators 

Housing  Inspections 


Real  estate 
Title  searches 

Parks  and  recreation: 
District  parks 

Ground  maintenance 
Facility  maintenance 

Public  housing 

Grounds  maintenance 
Facility  maintenance 

Budding  Safety 
Plans  review 
Inspections 
Landfill  compaction 
Inspections 

Maintenance  services 
Building  maintenance 
Custodial  services 
Remodeling  crews 
Equipment  management 
maintenance 


Source:  Greiner  and  Hatry  1975. 

as  social  service  casework  and  activities  of 
lawyers.  Whether  such  applications  will  prove 
effective  is  yet  to  be  seen. 

Work  standards  are  probably  by  far  the  most 
used  form  of  productivity  measurement.^  The 
principles  of  work  mezisurement  and  work 
standards  are  well  known  because  of  the  pro- 
cedure's long  use  in  the  private  sector.  But 
there  are  difficulties,  and  governments  have 
not  always  followed  sound  procedures  in 
transferring  these  to  the  public  sector.  Stan- 
dards should  not  be  set  until  a "best  process" 
for  the  work  activity  is  identified,  so  the 
standard  will  not  be  based  on  poor  process. 
Allowances  have  to  be  included  for  worker 
personal  and  delay  times.  Standards  should  be 
based  on  a reasonable  worker  pace,  and  prod- 
ucts should  have  quality  standards  as  well,  so 
that  outputs  not  meeting  those  standards  are 
not  counted  as  output.  The  introduction  of 
work  standards  also  is  fraught  with  employee 
relations  problems.  Often  governments  have 
introduced  standards  without  adequate  com- 


^Note  that  ratios  such  as  the  "number  of  ceises  per 
caseworker"  or  "number  of  pupils  per  teacher"  are 
not  work  standards  but  rather  are  indications  of 
the  incoming  worklozui  per  employee;  such  ratios 
say  nothing  about  the  output  of  ^e  work  effort. 


munication  with  and  participation  of  employ- 
ees (Greiner  et  al.  1977). 

A 1973  survey  of  State  and  local  govern- 
ments found  that  61  of  509  responding  cities 
and  counties  (12  percent)  had  mcide  use  of 
work  standards  (Greiner  and  Hatry  1975).  Most 
of  these  mezisurements  were  established  since 
the  1960s.  Of  the  State  governments,  10  of  42 
responding  governments,  or  24  percent,  in- 
dicated that  they  used  work  standards  in  at 
least  one  agency. 

It  seems  quite  likely  that  usage  at  both 
State  and  local  levels  has  increaised  substan- 
tially in  the  past  two  decades.  A 1976  Inter- 
national City  Management  Association  survey 
found  that  51  percent  of  the  responding  379 
local  governments  reported  some  form  of  work 
standard/work  mecisurement  activity  (Fuku- 
hara  1977).  This  figure,  however,  very  likely 
includes  many  jurisdictions  without  system- 
atically obtained  (engineered)  standards. 


Effectiveness  Measurement  Procedures 

A number  of  jurisdictions  have  begun  to 
introduce  effectiveness  measurement  proce- 
dures on  a regular  basis.  These  include  Char- 
lotte, N.C.;  Dallas,  Tex.;  Dayton,  Ohio;  Lake- 


44 


wood,  Colo.;  New  York  City;  Phoenix,  Ariz.; 
St.  PetersbiiTg,  Fla.;  and  San  Diego  and  Sun- 
njrvale,  Calif.  Effectiveness  data  are  obtained 
by  three  principal  approaches:  analysis  of 
agency  records,  trained  observer  ratings  (of 
street  cleanliness,  park  maintenance  condi- 
tions, road  conditions,  etc.),  and  citizen/client 
surveys  (see  The  Urban  Institute  and  ICMA 
1977).  The  latter  two  approaches  are  rela- 
tively new  to  local  governments.  The  cities  of 
Wzishington,  D.C.;  New  York;  Savannah;  zind 
Charlotte,  for  example,  have  made  periodic 
ratings  of  the  cleanliness  of  their  streets  by 
using  inspectors  trained  in  the  use  of  a pho- 
tographic rating  scale.  Exhibit  5 illustrates  the 
use  of  such  effectiveness-rating  data  (from 
the  New  York  City  Department  of  Sanitation). 
As  illustrated,  such  performance  data  can  be 
used  to  make  comparisons  among  areas  of  the 
jurisdiction,  from  one  time  period  to  another, 
and  to  compare  actual  results  with  targets. 

The  use  of  regular  surve3rs  of  clients  to  ask 
them  to  rate  various  characteristics  of  serv- 
ices appears  to  be  growing.  Surveys  of  random 
samples  of  citizens  (such  as  500  to  1,000 
households)  have  been  undertaken  both  to  ob- 
tain citizen  ratings  of  service  characteristics 
and  to  discover  factual  information  related  to 
service  quality,  such  as  extent  of  citizen  use 
of  various  government  programs  (e.g.,  park 
and  recreation,  libraries,  and  transit),  extent 
of  crime  victimization,  and  frequency  of  rat 


sightings.  The  cities  of  Dallas;  Dayton;  Kansas 
City,  Kans.;  Randolph  Township,  N.J.;  and  St. 
Petersburg,  Fla.,  have  been  unusijal,  thus  far, 
in  undertaking  such  citizen  surveys  for  these 
purposes  on  a periodic  baisis,  thereby  permit- 
ting trends  and  progress  to  be  identified.  A 
number  of  other  jurisdictions,  small  and  large, 
have  recently  begun  to  try  such  surveys,  pri- 
marily on  a one-time  basis.  These  include  Palo 
Alto,  Calif.;  Sioux  City,  Iowa;  Sunnyvale, 
Calif.;  and  Zeeland,  Mich.  Examples  of  regular 
use  of  such  surveys  by  State  governments  for 
assessing  service  performzince  are  as  yet  rare, 
although  North  Cairolina  and  Wisconsin  have 
undertaken  tests  of  surveys  of  their  citizens. 


Comprehensive  Measurement  Systems 

It  is  becoming  apparent  that  the  complex- 
ities of  government  services  require  multiple 
productivity  measurements  for  each  service  in 
order  to  provide  a comprehensive  perspective 
on  how  productivity  is  progressing.  Seldom 
does  a single  mezisure  capture  enough  infor- 
mation to  provide  government  officials  or  the 
public  with  a satisfactory  perspective.  One 
effort  was  the  total  performance  measure- 
ment system  (TPMS)  approach  that  the  U.S. 
General  Accounting  Office,  the  National  Cen- 
ter for  Productivity  and  Quality  of  Working 
Life,  and  the  Office  of  Policy  Development 


Exhibit  5.  Illustration  of  use  of  effectiveness-measurement  information: 

New  York  City  citywide  street  cleanliness— December  1976' 

Percent  of  streets  acceptable 
(1.5  or  cleaner) 

Percent 


Rank 

better  ( + ) 

Dec. 

Sanitation 

this 

or  below 

1976 

Command^ 

month 

( — ) target 

target 

Brooklyn 

1 

+ 5.2 

1.55 

North 

Bronx  West 

2 

+3.8 

1.56 

Manhattan 

West 

3 

+2.7 

1.50 

Richmond 

10 

0.0 

1.22 

Queens 

North 

11 

-0.8 

1.22 

Citywide 

+ 1.5 

1.36 

Dec. 

Dec. 

Change 

1976 

1975 

Dec. 

Dec. 

Dec.  1975 

rating 

rating 

1975 

1976 

to  Dec.  1976 

1.47 

1.52 

45.1 

48.6 

+ 3.5 

1.50 

1.54 

53.8 

46.4 

-7.4 

1.46 

1.49 

53.6 

50.4 

-3.2 

1.22 

1.24 

86.3 

89.5 

+3.2 

1.23 

1.20 

93.1 

89.5 

-3.6 

1.34 

1.34 

73.1 

71.7 

-1.4 

‘New  York  City  rates  cleanliness  on  a scale  of  1.0  to  3.0.  The  lower  the  rating,  the  cleaner  the  street. 
^Five  of  a total  of  11  Sanitation  Command  areas  are  shown. 

Source:  City  of  New  York,  The  Mayor's  Management  Report.  April  26,  1979. 


45 


and  Research  of  the  U.S.  Department  of 
Housing  zmd  Urban  Development  explored  in 
1975-78.  Test  sites  included  the  State  of 
Washington's  Department  of  General  Admin- 
istration; Los  Angeles  County's  Patient  Fin- 
ancial Services  Division  of  the  Department  of 
Public  Health;  a number  of  agencies  in  the 
city  of  Sunnyvale,  Calif.;  and  the  Office  of 
Housing  Production  and  Mortgage  Credit  of 
the  Region  DC  Office  of  HUD.  The  State  of 
New  Jersey's  water  resources  agency,  the  city 
of  Cincinnati's  highway  maintenance  division 
(see  ICMA  1978  for  an  early  zissessment),  and 
a variety  of  agencies  in  the  cities  of  Manhat- 
tan Beach,  Long  Beach,  aind  San  Diego,  Calif. 
(1980),  have  undertaken  zidditional  trials. 

In  TPMS,  performance  data  are  collected  on 
both  output  per  unit  of  input  and  client  per- 
ceptions of  service  quality.  However,  two  ad- 
ditional ingredients  are  included  in  the  TPMS 
concept:  employee  attitude  surveys  and  an 
attempt  to  integrate  the  data  from  all  three 
sources  (customers,  hard  data,  and  employees). 
Analysis  of  these  data  is  aimed  at  suggesting 
productivity  improvements.  Thus,  TPMS  not 
only  serves  as  a meaisurement  tool  but  also 
includes  a productivity  analysis  component. 
The  employee  attitude  information  is  used  to 
identify  productivity  problems  and  is  not  itself 
productivity  measurement  information. 

The  General  Accounting  Office  felt  the 
initizil  TPMS  results  were  encouraging.  More 
State  eind  local  governments  appear  to  be 
trying  these  individual  components,  but  they 
are  not  generally  linked  into  one  integrated 
system.  The  concept  of  using  a variety  of 
information  on  efficiency,  effectiveness  (in- 
cluding client  feedback),  and  employee 
attitudes  is  provocative  and  merits  careful 
evaluation  (see  National  Center  for  Produc- 
tivity 1978  for  a description  of  the  process). 

Most  of  the  governments  identified  in  the 
section  on  effectiveness  measurement  proce- 


dures also  have  attempted  to  develop  compre- 
hensive mecisurement  systems  encompassing 
both  efficiency  and  effectiveness  information. 
Probably  the  largest  single  effort  has  been 
that  of  New  York  City.  Under  a charter  re- 
quirement to  provide  at  least  annual  reports 
on  agency  performance,  the  city  has  been  de- 
veloping both  a productivity  analjrsis  reporting 
system  (with  an  emphasis  on  unit  costs  and 
work  standards)  aind  a service  quality  meas- 
urement s3TStem  (emphasizing  service  quality/ 
effectiveness)  (see,  e.g..  City  of  New  York 
1979).  These  efforts  at  comprehensive  per- 
formance measurement,  however,  must  still  be 
considered  as  being  in  their  childhood,  if  not  in 
their  infancy. 


Whose  Productivity  Is  Being  Measured? 

The  current  state  of  the  art  in  productivity 
measurement  depends  to  a considerable  extent 
on  whose  productivity  is  to  be  measured.  Such 
assessments  may  be  directed  at  measuring  the 
performeince  of  individuals  or  of  groups  of 
employees  (e.g.,  employees  of  small  work 
groups  or  of  whole  agencies),  and  they  may  be 
directed  at  measuring  various  tjrpes  of  work. 
Productivity  measurements  can  evaluate  ac- 
tivities that  are  relatively  routine  and  repet- 
itive (e.g.,  clerical  workers)  or  complex  and 
varied  (e.g.,  caseworkers  and  therapists).  They 
can  be  used  to  evaluate  supervisors  and 
managers  or  at  the  work  of  highly  complex 
jobs  for  which  specific  outputs  au-e  difficult  to 
identify  and  whose  consequences  are  not  likely 
to  be  seen  for  years  (e.g.,  engineers,  scien- 
tists, and  grant  monitors,  jobs  that  are  more 
common  at  the  Federal  level  than  in  State  or 
local  governments). 

Exhibit  6 summarizes  the  author's  judgment 
of  the  current  state  of  the  productivity  meas- 
urement art  for  these  categories.  Measuring 


Exhibit  6.  Current  ability  to  measure  productivity 


Type  of  work 

Whose 
productivity 
is  measured 

Routine  and 
repetitive 
work  (e.g., 
clerical) 

Complex  individual 
contributor  work 
(e.g.,  cziseworkers 
and  clinicians) 

Engineer/ 

scientist 

grant 

monitoring 

Supervisory/ 

managerial 

Individual 

High 

Medium 

Low 

Low 

Group 

High 

Medium 

Low 

♦Depends  on  the  pzu-ticular  type  of  work  supervised. 


46 


repetitive  work  is  well  within  the  current 
state  of  the  art  because  of  relative  ezise  in 
identifying  outputs  and  experience  with  work 
measurement  and  work  standard  procedures. 
Emerging  techniques  for  assessing  outcome 
improvements  in  a variety  of  services  suggest 
that  measurability  is  improving  for  more  com- 
plex activities,  such  as  the  work  of  casework- 
ers and  mental  health  clinicians.  Assessing  an 
individual  employee's  contribution,  however, 
remains  dubious,  since  more  than  one  em- 
ployee may  contribute  to  a given  product. 
However,  for  supervisors  and  managers,  if  it  is 
accepted  that  they  should  be  mezisrored  by  the 
work  of  their  group,  measurement  becomes 
more  feasible.  For  supervisors  and  managerial 
employees,  the  measurability  then  depends  to 
a large  extent  on  the  type  of  work  they 
supervise. 

Intergovemment,  Comparative,  Productivity 
Measurement 

Many  government  officials  would  like  norms 
or  standards  against  which  to  compare  the 
performance  of  their  own  services.  These 
could  be  national  standards  or  performance 
levels  for  similar  governments.  Comparative 
data  can  be  a stimulus  to  encourage  low 
performers  to  improve. 

Currently,  however,  few  comparative, 
across-govemment  productivity  measurements 
exist.  Some  comparative  effectiveness  indi- 
cators are  compiled,  such  as  crime  and  arrest 
rates,  numbers  of  fire  incidents  and  amount  of 
fire  loss,  pollution  counts,  unemployment  and 
family  income,  and  mortality  and  morbidity 
figures. 

For  classic  productivity  meeisures  (output- 
input  ratios),  there  are  even  fewer  compara- 
tive data.  Some  ad  hoc  studies  have  been  done, 
such  as  those  for  solid  waste  collection  that 
provide  norm  data  on  cost  per  household 
(Columbia  University  et  al.  1978),  but  these 
data  are  not  collected  regularly  to  indicate 
trends.  The  Federal  Government  collects  an- 
nual data  on  expenditures  per  capita  for  a 
number  of  service  categories,  but,  ets  noted 
earlier,  these  are  not  productivity  mezisures. 
(Number  of  residents  is  not  zin  indicator  of 
output.)  The  Federal  Government  hzis  not  un- 
dertaken any  program  to  regularly  obtain  in- 
tergovernmental productivity  data. 

A major  part  of  the  problem  is  that  few 
individual  governments  collect  productivity 
data  regularly.  Where  they  do,  each  uses 
somewhat  different  procedures  (e.g.,  differ- 


ences in  how  outputs  and  inputs  are  defined). 
Even  data  on  work  standards  for  common 
services  (such  2is  vehicle  repair  or  standard 
clerical  or  data  processing  operations)  have 
seldom  been  collected  and  compaired  for  mul- 
tiple sites;  each  government  develops  its  own 
standards,  and  these  are  not,  in  general, 
sheired  with  other  governments  (for  an  in- 
teragency exception,  see  USDOD  1977). 


Prospects  for  the  Future 

What  are  the  prospects  for  the  future  of 
productivity  meeisurement?  A number  of  fa- 
cilitating and  inhibiting  factors  exist. 

Facilitating  Factors 

• A number  of  pressures  are  encouraging 
more  efforts  at  improving  productivity  as 
a way  to  decrease  or  contain  expendi- 
tures. These  include  State  and  local  gov- 
ernment fiscal  crises  (or  near-crises), 
pressure  from  the  Federal  Government  on 
State  and  local  governments  for  program 
evaluation,  the  growth  of  performance 
auditing  spurred  by  the  U.S.  General  Ac- 
counting Office,  growing  citizen  group 
pressures  for  accountability,  sunset  laws 
requiring  periodic  review  of  government 
activities,  and  the  increase  in  "manage- 
ment-by-objectives" tjrpe  programs.  Each 
of  these  appears  to  require  adequate 
measurement  of  productivity  on  a regular 
basis,  encompassing  both  efficiency  and 
effectiveness. 

• There  hzis  been  considerable  development 
of  the  state  of  the  art  of  measurement, 
shown  in  citizen  and  client  survey  tech- 
niques, systematic  trained  observer  ap- 
proaches, and  others. 

• The  Federal  Government  hopes  to  im- 
prove the  nature  and  scope  of  the  current 
Federal  measurement  system  both  to  add 
mezisures  on  multifactor  productivity, 
e.g.,  output  per  dollzu*,  and  to  tjike  into 
consideration  the  quality  and  effective- 
ness of  services  (USOPM  1980,  pp.  iii  and 
16-21). 

• Adding  to  all  this  is  the  trend  for  schools 
of  public  administration  and  simileir  pro- 
fessional graduate  schools  to  expose 
students  to  a wide  variety  of  quantitative 
tools,  so  that  the  managers  and  public 


47 


officials  of  the  future  will  be  familiar 
with,  and  more  likely  to  wzint  and  use, 
productivity  information. 

Inhibiting  Factors 

Despite  these  favorable  factors,  however, 
considerable  constraints  exist  on  the  wide- 
spreeid  emergence  of  productivity  measure- 
ment, 

• The  analytical  capacity  of  State  and  local 
governments  is  likely  to  remain  limited. 
The  meeisurement  effort  discussed  here 
can  require  considerable  data  collection 
and  analysis.  The  data  collection  proce- 
dures themselves  may  not  be  very  expen- 
sive; however,  some  in-depth  analysis  is 
needed  to  obtain  the  full  benefits  from 
such  regular  meausurement.  Smaller  gov- 
ernments will  not  be  able  to  apply  much 
in  the  way  of  new  resources  to  such  new 
procedures.  Unless  they  receive  eissist- 
ance  (perhaps  from  their  region  or  State, 
or  the  Federal  Government),  their  ability 
to  make  substantial  efforts  (such  as  to 
undertake  comparative  measurements 
with  other  governments  or  even  to  sup- 
port annual  citizen  surveys)  will  continue 
to  be  limited.  The  larger  governments, 
however,  including  most  States,  should  be 
able  to  undertake  at  least  some  such  data 
collection  and  analysis. 

• An  underlying  issue  at  all  levels  of  gov- 
ernment is  the  problem  of  setting  up 
regular  uses  for  performance  measures 
that  are  sufficiently  attractive  to  make 
agency  people  wish  to  cooperate  in  the 
measurement  effort  and  use.  Improve- 
ments in  government  management  in- 
centive systems  are  needed  to  encourage 
zidministrators  to  analyze  productivity 
and  implement  change.  The  Civil  Service 
Reform  Act  of  1978,  with  its  inclusion  of 
performance  incentives  and  its  aim  to 
increzise  personal  accountability  of  top 
managers,  seems  inevitably  to  require 
improved  Federal  productivity  measure- 
ment. Many  State  and  local  governments 
also  have  begun  introducing  performance 
"contracts”  or  even  incentive  pay  plans 
tied  to  performance  for  at  least  mana- 
gerial employees  (Greiner  et  al,  1981; 
USOPM  n.d.).  These  should  spur  the  rec- 
ognition of  current  deficiencies  and  en- 
courage improvements  in  productivity 
mezisurement.  At  present,  however. 


neither  government  mainagers  nor  elected 
public  officials  have  indicated  much  will- 
ingness or  ability  to  xase  productivity  data. 
This  is  due,  in  part,  to  the  lack  of 
experience  with  comprehensive,  reliable 
productivity  meaisurements.  Usage  should 
grow  2is  the  data  themselves  become  more 
worthy  of  use. 

• The  lack  of  comparative  productivity  data 
constrains  interest  in  productivity  meas- 
urement. Annual  samples  of  local  govern- 
ment performaince  on  various  measures, 
collected  with  standardized  procedures, 
might  be  undertaken  nationally  or  region- 
ally. Individual  governments,  if  they  col- 
lect the  same  data  themselves,  could  then 
compare  themselves  against  these 
norms— as  a spur  to  corrective  actions. 
State  and  local  government  officials  and 
the  Federal  Government  need  to  consider 
whether  such  comparative  data  will  be 
more  constructive  than  troublesome.  If  and 
when  effort  is  made  to  develop  com- 
parative data,  information  shovild  be  ob- 
tained that  considers  the  difficxalty  of  the 
incoming  workload  and  the  level  of  serv- 
ice provided.  Even  without  intergovern- 
mental data,  an  individual  government  can 
still  compare  its  own  performance  from 
one  year  to  the  next,  from  one  geo- 
graphic area  to  another,  or  from  one  fa- 
cility within  the  jurisdiction  to  the  next,  at 
lezist  for  some  meaisures. 


Prospects 

The  Federal  Government  has  shown  inter- 
mittent interest  in  comparative  productivity 
measurements  for  State  and  local  government 
services  for  several  years.  Peist  interest  was 
expressed  by  the  National  Productivity  Coun- 
cil (USOPM  1979,  pp,  30-31),  the  Joint  Eco- 
nomic Committee  of  the  U.S.  Congress  (1979, 
p.  7),  and  the  National  Research  Council  (1979, 
p.  10).  The  most  recent  Federal  effort  hzis  been 
that  by  the  U.S.  Bureau  of  Labor  Statistics.  It 
examined  comparative  productivity  data  for 
three  services— electric  utilities.  State  alco- 
holic beverage  control  operations,  and  unem- 
plojnnent  insuraince-  and  explored  the  possi- 
bilities for  future  measurement  for  four  other 
services— solid  waste  collection  and  disposal, 
drinking  water  supply,  mziss  transit,  and  the 
unemployment  service  (USBLS  1983).  Tight 
funding  at  the  Bureau  of  Labor  Statistics  and 
the  Office  of  Personnel  Management  (in- 


48 


eluding  dropping  of  OPM's  own  productivity 
improvement  program),  however,  indicates 
that  major  near-future  support  is  not  likely. 

Governments  have  just  begun  to  consider 
the  wide  range  of  productivity  measurement 
options,  and  considerable  experimentation  is 
likely  in  future  years.  If  help  is  forthcoming, 
including  training  and  research  by  universities 
and  others,  along  with  some  limited  technical 
assistance  (perhaps  provided  by  State  or  re- 
gional agencies),  substantial— -albeit  slow-  - 
progress  can  be  expected  over  the  forthcoming 
decade. 


References 

American  Association  for  the  Advancement  of 
Science,  Office  of  Public  Sector  Programs. 
Appendix  C:  Effectiveness/productivity 

mezisurement  research  and  technical  sis- 
sistance  needs.  In:  Report  from  the 

Woric^op  on  Management,  Finance,  and 
Personnel.  Weishington,  D.C.:  AAAS,  1978. 

Cities  of  Manhattan  Beach,  Long  Beach,  and 
San  Diego,  Calif.  Total  Performance  Man- 
agement Project  report  series.  April  1980. 

City  of  New  York.  The  Mayor's  Management 
Report.  April  26,  1979. 

Columbia  University,  Public  Technology,  Inc., 
and  International  City  Management  Associ- 
ation. Evaluating  Residential  Refuse  Col- 
lection Costs:  A Worlcbook  for  Local  Gov- 
ernment. Washington,  D.C.:  ICMA,  1978. 

Fukuhzu*a,  R.  Productivity  improvement  in 
cities.  In:  The  Municipal  Yearbook  1977. 
Wzishington,  D.C.:  International  City  Man- 
agement Association,  1977. 

Greiner,  J.M.,  and  Hatry,  H.P.  Employee  In- 
centives to  Improve  State  and  Loc^  Gov- 
ernment Productivity.  (U.S.  Govt.  Print. 
Off.  Stock  No.  052-003-00090-3.)  Wash- 
ington, D.C.:  National  Center  for  Pro- 
ductivity and  Quality  of  Working  Life,  1975. 

Greiner,  J.M.;  Dahl,  R.E.;  Hatry,  H.P.;  and 
Millar,  A.P.  Monetary  Incentives  and  Work 
Standards  in  Five  Cities:  Impacts  and  Impli- 
cations for  Management  and  Labor.  Wash- 
ington, D.C.:  The  Urban  Institute,  1977. 

Greiner,  J.M.;  Hatry,  H.P.;  Koss,  M.P.;  Millar, 
A.P.;  and  Woodward,  J.P.  Productivity  and 
Motivation:  A Review  of  State  and  Local 
Government  Initiatives.  Wzishington,  D.C.: 
The  Urban  Institute,  1981. 

Hatry,  H.P.;  Clarren,  S.N.;  van  Houten,  T.; 
Woodward,  J.P.;  and  Donvito,  P.A.  Effi- 
ciency Measurement  for  Local  Government 
Services:  Some  Initial  Suggestions.  Wash- 


ington, D.C.:  The  Urban  Institute,  1979. 

International  City  Management  Association. 
Work  mezisurement  in  local  governments. 
Management  Information  Service  Report 
6(10),  1974. 

International  City  Management  Association. 
Total  Performance  Management  in  Cincin- 
nati, Ohio.  Managerial  Innovations  Report 
#27.  Weishington,  D.C.:  the  Association, 

1978. 

Joint  Economic  Committee,  Congress  of  the 
United  States.  Productivity  in  the  Federal 
Government:  A Staff  Study.  Washington, 
D.C.:  Supt.  of  Docs.,  U.S.  Govt.  Print.  Off., 

1979. 

Mark,  J.A.  Measuring  Federal  productivity: 
Problems  and  progress.  Civil  Service  Journal 
January/March  1979. 

National  Center  for  Productivity  and  Quality 
of  Working  Life.  Total  Performance  Man- 
agement: Some  Pointers  for  Action.  (Stock 
No.  052-003-00577-8.)  Washington,  D.C.: 
Supt.  of  Docs.,  U.S.  Govt.  Print.  Off.,  1978. 

National  Commission  on  Productivity  and 
Work  Quality.  Improving  Municiped  f^oduc- 
tivity:  Work  Measurement  for  Better  Man- 
agement. Weishington,  D.C.:  the  Com- 

mission, 1975. 

National  Research  Council,  National  Academy 
of  Sciences.  Measurement  and  Interpre- 
tation of  Productivity.  Washington,  D.C.: 
the  Council,  1979. 

Neumann,  B.R.  Hospital  productivity:  An 

evaluation  of  proposed  measurement  meth- 
ods. Public  Productivity  Review  l(5):23-26, 
1976. 

Poister,  T.H.,  and  McGowan,  R.P.  Municipal 
management  capacity:  Productivity  im- 

provement and  strategies  for  handling  fiscal 
stress.  In:  The  Municipal  Yearbook  1984. 
Washington,  D.C.:  International  City  Man- 
agement Association,  1984. 

Public  Technology,  Inc.  Improving  Productivity 
Using  Work  Measurement.  Washington,  D.C.: 
PTI,  1977. 

Ross,  J.P.,  and  Burkhead,  J.  Productivity  in 
the  Local  Government  Sector.  Lexington, 
Mass.:  Lexington  Books,  1974. 

The  Urban  Institute  and  International  City 
Management  Association.  How  Effective 
Are  Your  Community  Services:  Procedures 
for  Monitoring  the  Effectiveness  of 
Municipal  Services.  Washington,  D.C.:  the 
Institute  and  ICMA,  1977. 

The  Urban  Institute  and  the  National. Associa- 
tion of  State  Budget  Officers.  The  Status  of 
Productivity  Measurement  in  State  Govern- 
ment: An  Initial  Exammation.  Washington, 


49 


D.C.:  the  Institute  and  the  Association, 
1975.  Available  from  NTIS,  Springfield,  Va. 
(Stock  No.  SHR0000422/LLC.) 

U.S.  Army  Engineering  Training  Agency.  Work 
Measurement  Guidelines  for  Federal  Gov- 
ernment Managers.  Rock  Island,  III.:  the 
Agency,  1973. 

U.S.  Bureau  of  Labor  Statistics.  Federal  Gov- 
ernment Productivity  Summary  Data:  Fiscal 
Years  1967-1982.  Waishington,  D.C.:  the 
Bureau,  Office  of  Productivity  and  Tech- 
nology, 1983. 

U.S.  Bureau  of  Labor  Statistics.  Measuring 
Productivity  in  State  and  Local  Govern- 
ment, Bulletin  2166.  Weishington,  D.C.:  Supt. 
of  Docs.,  U.S.  Govt.  Print.  Off.,  Dec.  1983. 

U.S.  Department  of  Defense.  Standardization 
of  Work  Measurement:  Basic  Volume  Gen- 
eral Guidance.  (5010.15.1-M.)  Washington, 
D.C.:  the  Department,  1977. 


U.S.  Joint  Financial  Management  Improvement 
Program.  Government  Productivity:  Volume 
1.  Washington,  D.C.:  the  Program,  1976. 

U.S.  Office  of  Personnel  Management,  Federal 
Study  Team.  Report  to  the  National  Produc- 
tivity Council:  Federal  Activities  To  Sup- 
port State  and  Local  Government  Pro- 
ductivity Improvement.  Washington,  D.C.: 
the  Office,  1979. 

U.S.  Office  of  Personnel  Management,  Work- 
force Effectiveness  and  Development 
Group.  Measuring  Federal  Productivity:  A 
Summary  Report  and  Analysis  of  the  FY78 
Productivity  Data.  (WPA-2.)  Washington. 
D.C.:  the  Office,  1980. 

U.S.  Office  of  Personnel  Management,  Office 
of  Intergovernmental  Personnel  Programs. 
Intergovernmental  Personnel  Notes.  Wash- 
ington, D.C.:  the  Office,  n.d. 


50 


Part  n 

The  State  of  the  Art  of  Management 


Introductory  Comments 

The  state  of  the  art  of  program  manage- 
ment is  important  for  the  success  of  program 
performance  measurement  systems  in  at  least 
three  ways.  The  first  involves  the  extent  to 
which  managers  structure  and  articulate  pro- 
grams in  the  form  of  clear  goals  and  objec- 
tives for  which  performance  indicators  can  be 
developed.  Because  the  relevance  of  meeisures 
to  the  program  depends  upon  the  actual  im- 
portance of  the  performance  indicators 
chosen,  the  ability  to  specify  what  is  really 
importJint  to  the  program  is  critical.  Manage- 
ment is  important  in  a second  way  in  the 
extent  to  v^ch  managers  use  program  per- 
formance measures  in  their  planning  and  ad- 
ministration. Finally,  management  influences 
performance  measurement  in  the  extent  to 
which  the  management  decisions  that  shape  a 
program  come  from  stakeholders  outside  the 
organization  whose  ability  to  participate  in 
governing  the  program  is  affected  by  the 
availability  of  program  information. 

Each  of  these  effects  will  be  considered 
here  briefly. 

Specification  of  Program  Goals 
and  Objectives 

One  of  the  teisks  of  an  organization’s  msin- 
agement  is  leadership,  which  involves  at  le2ist 
four  functions:  to  point  out  the  direction  in 
which  the  organization  is  to  move,  to  specify 
how  it  can  get  there,  to  provide  resources 
necessaiy  to  get  there,  and  to  motivate  staff 
to  use  the  resources  and  chosen  procedures  to 
get  to  the  goals.  Logically,  the  first  task  is  to 
identify  where  the  organization  should  go  or 
what  purposes  it  serves.  For  profitmaking  or- 
ganizations, this  tzLsk  appears  given  by  defi- 
nition. Complications  ensue,  however,  from 
the  need  to  consider  tax  implications  (which 
may  render  losses  into  profits),  the  interaction 
and  interchangeability  of  power  and  profits, 
and  other  interests  and  values  of  the  owners 


and  managers  of  companies.  Further,  all  or- 
ganizations have  goals  that  exceed  simply 
profitmaking.  These  include  moral  and  social 
costs  that  the  organization  is  unwilling  to  bear 
for  the  sake  of  profits  and  survival  goals  that 
spring  from  the  interests  of  employees.  For 
nonprofit  organizations,  goals  usually  are 
stipulated  in  broad  form  in  the  group's  char- 
ter. Again,  maintenance  goals  also  are  present 
but  seldom  explicitly  stated. 

In  defining  the  goals  of  the  organization, 
managers  usually  try  to  accomplish  at  lejist 
three  purposes:  to  tell  employees  the  purposes 
of  the  functions  they  are  eisked  to  carry  out, 
to  motivate  employees  to  carry  them  out  by 
clarifying  the  functions’  intrinsic  worth,  and 
to  describe  the  functions  in  a way  that  will 
cause  outsiders  to  support,  patronize,  and 
cooperate  with  the  organization.  These  pur- 
poses often  conflict,  however.  Clear  directions 
may  be  less  motivating  than  noble  but  vague 
ideals.  The  interests  of  a program’s  employees 
and  of  outsiders  frequently  differ.  Goals  are 
often  phrsised  rhetorically  to  capture  the  en- 
dorsement of  as  wide  a range  of  groups  as 
possible.  At  this  level  of  generality,  actions 
the  organization  could  take  or  woiild  take  to 
achieve  lofty  goals  usually  are  not  at  all  clezir. 

This  condition  of  ambiguity,  if  not  incon- 
sistency, is  not  peculiar  to  service-providing 
organizations.  Law-making  bodies  also  com- 
promise the  clarity  of  goals  and  the  precision 
of  proposed  actions  in  order  to  get  enough 
support  from  diverse  interests  to  pass  legis- 
lation. Because  laws  frequently  are  the  basic 
charter  documents  of  public  organizations, 
ambiguity  about  the  goals  of  public  programs 
is  widespread. 

Some  writers  distinguish  between  goals, 
which  are  quite  general  and  nonspecific  in 
means,  and  objectives,  which  are  time- 
limited,  specific,  and  phreised  in  terms  of 
operational  measures  that  permit  their 
achievement  to  be  assessed.  This  distinction  is 
quite  important  for  performance  measurement. 

The  artificiality  of  goals  and  objectives  has 
been  implied  already.  A program’s  goals  are 


51 


enunciated  for  a variety  of  purposes.  They 
include  much  more  than  may  in  fact  represent 
the  values  guiding  a program,  and  they  omit 
important  values  such  as  the  personal  motives 
of  program  staff.  Further,  they  may  be  quite 
time-limited.  A program  initiated  to  meet 
certain  societal  conditions  may  find  during  the 
life  of  the  program  that  these  societal  con- 
ditions have  changed,  sometimes  because  the 
program  was  successful  in  changing  them  or, 
more  likely,  because  of  events  outside  the 
program's  control.  Program  evaluators  dis- 
tinguish between  program  evaluation  oriented 
around  managers'  goals  (and  presumably 
closely  related  to  the  managers'  action  options 
and  therefore  more  likely  to  have  program 
impacts)  and  evaluation  that  takes  a systems 
perspective,  examining  program  activities  and 
impacts  in  relation  to  the  variety  of  values 
held  by  societal  stakeholders  in  the  program. 
These  stakeholders  include  program  clients, 
taxpayers,  program  staff,  and  program  com- 
petitors, as  well  as  managers  and  funders. 

Clearly,  if  program  evaluation  is  expected 
to  be  a tool  for  program  management,  the 
goal-directed  perspective  seems  appropriate. 
However,  insofar  as  program  evaluation  seems 
most  often  to  serve  managers  by  educating 
them  regarding  actions  they  take  on  other 
bases  (Patton  1978;  Weiss  and  Bucuvalis  1980), 
and  serves  groups  other  than  managers  (Krause 
and  Howard  1976),  a systems  perspective  may 
be  more  useful.  Similar  considerations  apply  to 
performance  mezisures.  That  is,  while  some 
meeisures  should  be  chosen  to  provide  infor- 
mation managers  identify  as  relevant  to  their 
decisions,  some  measures  should  be  chosen  to 
represent  the  values  of  other  stakeholders  in 
the  program. 

Deutscher  (1976)  has  warned  against  falling 
into  the  "goal trap"— that  is,  to  accept  stated 
goals  as  reflecting  actual  program  intentions. 
This  warning  can  be  applied  to  performance 
mezisiires  by  making  sure  that,  although  some 
mezisures  are  based  on  the  stated  program 
goals,  others  are  chosen  to  represent  addi- 
tional values  or  alternative  goals. 

It  should  be  noted  that  the  goal-based  ap- 
proach to  evaluation  is  by  far  the  most  popular 
with  managers  and  evaluators.  First,  it  is 
consistent  with  the  commonsense  view  of 
management  as  rational,  purposive,  and  in 
control  of  the  program.  Second,  managers  wish 
to  cultivate  this  image  of  their  being  in  con- 
trol, because  it  gives  them  status.  Since  man- 
agers employ  and  direct  program  evaluators, 
they  set  the  frame  of  reference  in  which 
evaluators  work.  Third,  it  is  simpler  to  use  this 


fairly  narrow  approach  than  to  undertake  a 
brozid  search  for  the  values  of  other  stake- 
holders and  to  consider  what  these  values 
might  be  in  the  future.  Fourth,  some  evidence 
suggests  that  a goal-baised  approach  may  yield 
greater  benefits  than  a systems  anal5d;ic  ap- 
proach. These  benefits  may  result  in  part  from 
byproducts  to  management  or  program  oper- 
ations. For  example,  the  goal  attainment 
scaling  approach  of  Kiresuk  and  Sherman 
(1968)  has  immediate  side  benefits  to  the 
therapy  process  by  increeising  both  therapists' 
focus  and  patient  participation  (Wilier  and 
Miller  1976).  Similarly,  the  specification  of  a 
program's  goals  to  guide  an  evaluation  effort 
is  an  occasion  for  the  management  to  orient 
staff  and  get  their  participation  in  an  impor- 
tant organizational  function. 

Walsh  (1980)  concliaded  from  a comparative 
study  of  goal-baised  and  goal-free  approaches 
to  educational  program  evaluation  that  the 
former 

. . . satisfied  the  requirements  of  the 
evaluation  task  more  successfully  thcin  did 
the  goal-  free  approach.  This  outcome  v/as 
attributed  to  the  internal  consistency 
found  in  the  goal-based  approach  with 
regard  to  the  purpose,  issues,  evidence, 
data  gathering,  analysis,  and  reporting  of 
the  evaluation,  and  the  match  between 
what  transpired  and  what  was  needed. 
Such  consistency  in  match  was  not  found 
in  the  goal-free  approach.  The  goal-free 
approach  was  foxond  to  excel  in  surfacing 
issues  to  consider  in  an  evaluation  and 
also  in  providing  insights  regarding  per- 
spectives which  might  be  taken  in  viewing 
a program. 

One  feature  distinguishing  goal-oriented 
approaches  from  those  that  do  not  focus  on 
goals  is  whether  one  is  tr3nng  to  be  prescrip- 
tive and  idealistic  or  descriptive  and  prag- 
matic. Over  the  past  30  years,  attempts  have 
been  made  to  render  U.S.  Federal  policy  and 
programs  more  rational.  These  Federal  pro- 
cedures, which  have  served  as  models  for 
other  levels  of  government,  include  the  plan- 
ning-programing-budgeting  system  (PPBS), 
management  by  objectives,  and  zero-beised 
budgeting.  Usher  and  Comia  (1981)  examined 
the  budgets  and  budget  manuals  from  123 
large  American  cities  to  determine  the  degree 
to  which  goal-setting  and  performance  as- 
sessment had  been  formally  incorporated  into 
municipal  budgeting.  They  found  that  most 
cities  require  agencies  to  state  goals,  but  in 


52 


no  more  than  half  the  cases  were  these  goals 
useful  standards  of  performance. 

Management  Use  of  Performance  Measures 

Frequently,  program  evaluators,  research- 
ers, and  data  technicians  blame  program  man- 
agers and  policymakers  for  the  low  utilization 
of  program  evaluation  results  or  findings  from 
applied  research.  They  also  may  go  beyond 
blaming  the  specific  individuals  to  blaming  the 
political  or  social  context  in  which  their 
technical  functions  are  embedded  zmd  by 
which  they  are  supported.  Two  bases  for  this 
blame  can  be  identified.  One  is  that  the 
amount  of  use  of  research  findings  and  pro- 
gram evaluation  results  does  indeed  seem 
disappointingly  low  from  the  perspective  of 
those  who  believe  that  programs  operate  ra- 
tionally to  pursue  their  alleged  goals,  be  they 
profit  or  benefit  to  consumers  and  the  public. 
The  other  is  that  the  power  to  implement 
results  seems  to  lie  largely  in  the  hands  of 
policsrmakers  and  program  administrators 
(although  such  power  may  sometimes  be  dis- 
puted by  these  politicians  and  eidministrators). 

Even  if  these  attributions  of  responsibility 
are  correct,  it  is  far  from  clear  that  any 
blame  should  be  leveled  at  policymakers  and 
administrators  for  making  little  use  of  re- 
se2irch  findings.  The  assumption  that  politi- 
cians and  administrators  can  and  should  use 
the  results  of  research  and  evaluation  over- 
looks the  fact  that  many  factors  other  than 
the  results  of  a given  study  must  be  included 
in  policy  formulation,  program  implemen- 
tation, and  management. 

These  other  factors  fall  into  two  types. 
Many  considerations  must  for  practical  rea- 
sons exceed  the  range  that  any  given  study  can 
encompziss.  These  will  include  issues  that  did 
not  seem  importzmt  to  the  researchers  and 
those  that  could  not  easily  be  observed  or 
assessed.  A second  group  of  considerations  2ire 
personal  and/or  political  motivations.  Re- 
searchers who  cissume  that  programs  do  or 
should  follow  a rational  model  regard  personal 
and  political  concerns  outside  the  scope  of 
factors  that  should  influence  program 
operation. 

As  a matter  of  fact,  such  personal  and  po- 
litical factors  often  dominate  program  oper- 
ations. Rich  (1981)  found  that  the  probability 
of  information's  being  used  in  policymaking  is 
a consequence  less  of  the  appropriateness  of 
the  information  to  the  substantive  policy  than 
of  its  utility  to  bureaucratic  interests.  Per- 


sonal and  political  factors  often  have  con- 
siderably more  legitimacy  than  researchers 
who  adopt  a rational  model  of  program  oper- 
ations recognize.  The  attitudes  of  employees, 
clients,  and  the  public  are  crucial  in  deter- 
mining the  success  of  any  program.  Staff 
members  can  block  changes  they  feel  are  not 
in  their  interests. 

Political  ideologies  also  often  suggest  that 
personal  and  political  factors  need  to  be  rec- 
ognized. In  a democracy,  public  service  pro- 
grams are  expected  to  serve  the  values  of  the 
general  public.  Participation  by  the  public  in 
designing  these  programs  usually  is  seen  as  one 
of  these  values. 

Criticism  of  management  by  researchers 
and  evaluators  seems  to  reflect  some  com- 
bination of  researchers'  excessive  ambition  (in 
that  they  wish  their  results  to  be  a dominant 
influence  in  mjinagement),  scapegoating  (in 
that  if  their  results  are  not  persuzisive,  they 
blame  the  managers  for  not  being  persuaded), 
and  lack  of  understanding  of  the  scope  of 
considerations  in  program  administration,  in^ 
eluding  the  costs  of  change. 

The  use  of  program  evaluation  results  or 
research  in  determining  how  zm  organization 
should  function  is  a part  of  the  model  of  ra- 
tionality that  is  attributed  to  bureaucracies  in 
the  classical  sociological  literature  (Coser  and 
Rosenberg  1964).  Although  we  live  in  a tech- 
nologically oriented  society  that  gives  high 
status  to  the  role  of  science,  knowledge,  and 
technology,  many  shortfalls  have  been  ob- 
served in  how  rationally  humans  can  operate, 
either  as  individuals  or  in  social  units.  The 
growth  in  governments  over  the  pzist  several 
decades  has  suggested  that  the  United  States 
wzis  tending  more  toward  a centrally  planned 
economy  and  favoring  less  a free  market  sys- 
tem of  competitive  capitalism.  At  the  least, 
there  seemed  need  for  increzising  counter- 
vailing influence  by  government  to  limit  pos- 
sible abuses  under  capitalism.  The  recent 
popularity  of  attacks  upon  government,  shown 
by  the  election  of  a President  who  zirgued  the 
virtues  of  a free  market  system  and  the  vices 
of  government  regulation  and  planning,  sug- 
gests broad  disillusionment  with  either  the 
planning  process  or  the  ztssignment  of  this 
process  to  government. 

Ouchi  (1980)  argued  that  relationships  do 
exist  between  types  of  organizations  and  the 
conditions  under  which  they  will  succeed  or 
fan.  He  distinguished  three  types  of  organi- 
zations on  the  basis  of  the  mechzinism  each 
employs  to  facilitate  a stable  pattern  of 
transactions  among  tzisk-interdependent  pzu*- 


53 


ties:  markets  (which  use  the  mechanism  of 
prices),  bureaucracies  (which  use  the  mecha- 
nism of  formal  authority),  and  clans  (which  use 
the  mechanism  of  socialization  of  members). 

Under  conditions  where  performances  of 
the  parties  may  be  audited  vdth  precision, 
but  where  goal  congruence  between  par- 
ties is  minimal,  markets  will  succeed; 
when  performance  becomes  unmeasur- 
able, however,  markets  will  fail.  Under 
conditions  where  performances  of  the 
parties  cannot  be  measiored  unambigu- 
ously, but  where  goal  congruence  between 
p2u*ties  is  complete,  clans  will  succeed; 
they  will  fail,  however,  if  opportunism 
erodes  goal  congruence.  When  neither 
extreme  condition  exists  (that  is,  when 
neither  performance  measiorability  nor 
goal  congruence  can  be  assured),  markets 
and  clems  can  be  expected  to  give  way  to 
the  rules  of  formal  authority  provided  by 
the  bureaucratic  form  of  organization. 
One  is  then  led  to  conclude  that  bu- 
reaucracies are  more  robust  than  either 
meirkets  or  clems,  both  of  which  rely  on 
extreme  conditions  that  obtain  only  rarely 
in  collective  activity.  (Kimberly  et  al. 
1980,  p.  341) 

Ouchi's  analysis  suggests  that  program 
performance  measurement  as  a function 
should  favor  market  conditions. 

Mintzberg's  (1973)  empirical  study  of  the 
nature  of  managerial  work  indicates  the  po- 
tential importemce  performance  meeisurement 
could  have  to  managers.  Mintzberg  found  that 
the  manager's  power  derives  from  his  or  her 
increased  information,  which  comes  from 
many  sources.  But  Mintzberg  felt 

There  is  no  science  in  memagerial  work. 
Managers  work  essentially  as  they  always 
have— with  verbal  information  and  in- 
tuitive (nonexplicit)  processes.  The  man- 
agement scientist  hcis  hzid  almost  no  in- 
fluence on  how  the  manager  works.  . . . 
The  management  scientist  . . . can  pro- 
vide significant  help  for  the  manager  in 
information-processing  and  strategy- 
making, provided  he  can  better  under- 
stand the  meinager's  work  and  can  gain 
access  to  the  manager’s  verbal  data  base. 

(p.  5) 

Bolster  eind  McGowan  (1984)  surveyed  the 
prevalence  of  selected  management  systems 


and  strategies  in  1982-83  in  governments  of 
cities  between  25,000  and  1 million  population. 
They  found  "substantially  increased  use"  of  a 
variety  of  management  tools  since  a 1976 
survey.  "Performance  monitoring,"  "the  regu- 
lau*  monitoring  of  programs  vising  key  effec- 
tiveness and  efficiency  indicators  to  track 
performance,"  was  reportedly  used  by  68  per- 
cent of  reporting  municipalities  in  1982-83, 
compared  with  only  28  percent  in  1976.  It  was 
usually  applied  in  selected  areas  rather  than 
city-wide.  When  asked  to  rate  these  tech- 
niques' effectiveness  ”as  an  aid  to  sound  pro- 
gram administration  and  decisionmaking,"  only 
29  percent  of  the  respondents  who  reported 
using  performcince  monitoring  rated  this 
technique  "very  effective."  The  authors  con- 
cluded that  "clearly  many  cities  have  devel- 
oped systems  of  performance  measurement,  at 
lezist  in  some  service  delivery  areeis;  but  in  the 
eyes  of  the  survey  respondents  in  many  cases, 
they  have  not  been  used  very  effectively"  (p. 
219). 

Performance  measurement  makes  informa- 
tion and  its  implications  for  the  organization 
highly  explicit.  Homer  Hagedom,  in  the  chap- 
ter that  follows,  considers  how  well  managers 
can  profit  from  this  technology. 

There  is  another  side  to  the  art  of  manage- 
ment than  how  to  use  information  to  make 
decisions.  This  is  how  to  shape  information  to 
influence  others'  decisions.  While  the  most 
usual  view  of  managers  is  of  decisionmakers 
shaping  their  programs  on  the  beisis  of  their 
superior  information  or  instincts,  in  an  in- 
crezisingly  interdependent  world  many  man- 
agers find  much  of  their  time  and  effort  is 
spent  coping  with  their  external  environment, 
pursuing  funds,  expanding  mandates,  and  es- 
tablishing relationships  with  other  organi- 
zations that  control  resources  2ind  legitimacy. 
Rather  than  shaping  their  own  program  by 
decisions,  they  are  trying  to  shape  the  deci- 
sions of  others  toward  their  program.  This 
means  that  they  may  "mzinage"  information 
that  goes  to  funders  and  regulators.  This  is 
only  one  of  several  reasons  that  performance 
measurement  may  prove  dysfunctionail  to 
programs,  2is  Pauline  Ginsberg  explicates  in  a 
second  chapter  on  the  state  of  the  art  of 
management. 

A final  eispect  of  the  manager's  role  in  using 
the  results  of  performance  measures  is  how 
much  value  the  manager  sees  in  the  perform- 
zmce  measvires,  as  compared  with  the  total 
costs  of  obtaining  this  information  (Allan  and 
Solomon  1983).  While  some  aspects  of  the 


54 


costs  of  program  performance  measurement 
can  be  meeisured  quantitatively,  others  cannot. 
Thus,  marginal  dollars  expended  can  often  be 
calculated.  However,  non-monetary  costs  are 
more  difficult  to  quantify,  and  how  they  can 
be  added  to  dollar  costs  is  unclear.  Further, 
one  aspect  of  costs  is  loss  of  benefits  from 
foregone  alternatives.  Since  benefits  are 
difficult  to  eissess  in  all  their  dimensions  and 
for  long  time  frames,  quantitatively  assessing 
the  loss  in  potential  benefits  from  foregone 
alternatives  is  beyond  current  research  skills. 
Nonetheless,  managers  need  to  make 
judgments  about  the  value  of  the  performance 
measurement  information  they  receive,  in 
relation  to  the  costs  of  this  information  so 
that  they  can  decide  whether  and  what  form 
of  performance  mesisurement  system  they 
should  support. 

Breadth  of  Participation  in 
Program  Management 

Some  evidence  (summarized  in  Windle  1979) 
indicates  that  program  evaluation  information 
with  action  implications  is  more  likely  to  be 
used  if  it  is  maide  public  than  if  it  is  simply 
shared  with  the  program  managers.  Thus,  the 
general  public,  program  clients,  employees 
outside  of  mzinagement,  program  funders, 
regulators,  opponents,  and  competitors  all  may 
be  part  of  the  policymaking  and  management 
process.  The  potential  for  these  groups  to  be 
influential  will  depend  not  only  on  their  power 
and  their  interest  in  the  program  but  also  on 
how  much  information  about  the  program  they 
have.  Performance  measurement  systems  that 
continue  over  enough  time  for  their  existence 
and  significance  to  become  known  provide 
information  on  which  outside  groups  can  act. 
For  this  reason,  such  systems  are  likely  to  be 
required  to  apply  to  projects  funded  by  the 
government,  \^ch  then  can  use  these  meas- 
ures to  monitor  project  compliance  with 
funding  requirements.  In  addition,  by  making 
these  meeisures  public,  the  government  can 
enlist  outsiders  in  the  monitoring  process,  to 
the  extent  to  which  these  outsiders  agree  with 
the  criteria  represented  by  the  performance 
measures. 

This  potential  of  performance  mesisurement 
to  broziden  the  range  of  managers  may  hold 
little  appeal  to  the  managers  of  particular 
projects  (Katz  1977),  but  it  may  be  important 
to  preserving  a participatory  democratic  civic 
culture. 


References 


Allan,  M.C.  and  Solomon,  L.C.  (eds.)  The  Costs 
of  Evaluation.  Beverly  Hills,  Calif.:  Sage, 
1983. 

Coser,  L.A.,  and  Rosenberg,  B.  Sociological 
Theory:  A Set  of  Readings.  2d  ed.  New 
York:  MacMillan,  1964. 

Deutscher,  I.  Toward  avoiding  the  goal-trap  in 
evaluation  research.  In:  Abt,  C.C.,  ed.  The 
Evaluation  of  Social  Programs.  Beverly 
Hills,  Calif.:  Sage,  1976.  pp.  249-268. 

Katz,  J.  Cover-up  and  collective  integrity:  On 
the  natural  antagonisms  of  authority  in- 
ternal and  external  to  organizations.  Social 
Problems  25:3-17,  1977. 

Kimberly,  J.R.;  Miles,  R.H.;  and  aissociates, 
eds.  The  Organizational  Life  Cycle.  Wash- 
ington, D.C.:  Jossey-Bziss,  1980. 

Kiresuk,  T.J.,  and  Sherman,  R.E.  Goal  at- 
tainment scaling:  A general  method  for 
evaluating  community  mental  health  pro- 
grams. Community  Mental  Health  Journal 
4:443-453,  1968. 

Krause,  M.S.,  and  Howzird,  K.I.  Program 
evaluation  in  the  public  interest:  A new  re- 
search methodology.  Community  Mental 
Health  Journal  12:291-300,  1976. 

Mintzberg,  H.  The  Nature  of  Managerial  Woik. 
New  York:  Harper  and  Row,  1973. 

Ouchi,  W.G.  A framework  for  rmderstanding 
organizational  failure.  In:  Kimberly,  J.R.; 
Miles,  R.H.;  and  associates,  eds.  The  Or- 
ganizational Life  Cycle.  Washington,  D.C.: 
Jossey-Bziss,  1980.  pp.  395-430. 

Patton,  M.Q.  Utilization- focussed  Evaluation. 
Beverly  Hills,  Calif.:  Sage,  1978. 

Poister,  T.H.  and  McGowan,  R.P.  The  use  of 
management  tools  in  municipal  government: 
A national  survey.  Public  Administration 
Review  44:215-223,  1984. 

Rich,  R.  Social  Science  Information  and  Public 
Policy-Making:  The  Interaction  Between 

Bureaucratic  Politics  and  the  Use  of  Survey 
Data.  Wzishington,  D.C.:  Jossey-Bass,  1981. 

Usher,  C.L.,  and  Comia,  G.C.  Goal  setting  2tnd 
performance  assessment  in  municipal  bud- 
geting. Public  Administration  Review  41: 
229-235,  1981. 

Walsh,  P.L.  An  empirical,  evaluative  com- 
parison of  the  goal-based  and  goal-free  ap- 
proaches to  educational  program  evaluation. 
Ph.D.  dissertation.  University  of  Illinois  at 
Urbana-Champaign  1980  (Dissertation  ab- 
stracts international,  Vol  41/1 1-A,  p.  4664.) 

Weiss,  C.H.,  and  Bucuvalis,  M.J.  Truth  tests 


55 


and  utility  tests:  Decision  makers'  frames  of 
reference  for  social  science  research. 
American  Sociological  Review  45:302-313, 
1980. 

Wilier,  B.,  and  Miller,  G.H.  Client  involvement 
in  goal  setting  and  its  relationship  to  ther- 
apeutic outcome.  Journal  of  Clinical  Psy- 


chology 32:687-690,  1976. 

Windle,  C.  The  citizen  as  part  of  the  mana- 
gerial process.  In:  Schulberg,  H.C.,  and 
Jerrell,  J.M.,  eds.  The  Evaluator  and  Man- 
agement. Beverly  Hills,  Calif.:  Sage,  1979. 
pp.  69-87. 


56 


Data-Based  Monitoring 


Homer  J.  Hagedom,  Ph.D. 
Arthur  D.  Little,  Inc. 


Local  service  agency  managers  and  data- 
bzised  performance  measurement  systems  are 
not  yet  fully  compatible.  Recent  developments 
in  the  arts  of  management  and  program  eval- 
uation, however,  suggest  that  a better  match 
is  possible,  and  that  performance  monitoring 
has  a role  to  play  in  the  improved  relationship. 
Growing  awareness  of  management  concepts 
among  agency  meinagers  helps,  and  new  ideeis 
have  made  it  simpler  to  persviade  agency  man- 
agers that  they  manage  "learning  ssrstems”  as 
well  as  "structured  institutions"  for  the  de- 
livery of  social  or  clinical  services. 

Some  of  the  constructive  effect  of  the 
metaphor  of  the  learning  system  (as  portrayed 
in  staff  conferences,  manager/ employee  in- 
teraction, and  information  gathering)  comes 
from  the  emphasis  on  making  things  better 
rather  than  on  overcoming  managerial  or 
worker  deficiencies.  The  basic  utility  of  the 
learning  sjrstem  metaphor,  however,  is  in  the 
guidance  it  can  provide  as  to  what  zispects  of 
system  behavior  to  monitor  and  how  to  use  the 
information  generated  thereby.  (What  is  un- 
certain or  unknown  is  used  to  lead  to  learning.) 
Therefore,  the  learning  sj^tem  metaphor  is 
consistent  with  performaince  monitoring  sys- 
tem use. 

What  is  a data-based  monitoring  system?  As 
we  use  the  term,  it  is  a critical  and  con- 
structive use  of  program  data  by  funders  and 
managers.  It  is  a set  of  procedures  for  con- 
tinually gathering  <ind  organizing  information 
related  to  chosen  mea.sures  of  performance, 
which  will  permit  program  managers  and  those 
to  whom  the  program  is  responsible  to  know 
whether  they  are  moving  toward  specified  and 
(somewhat)  quantitative  goals  as  well  cis  how 
they  could  improve  the  program.  To  advance 
the  use  of  ^ta-b«ised  monitoring,  agency 
management  needs  to  be  open  to  manage- 
ment's own  need  to  learn,  sensitive  to  crucial 
differences  among  managers  in  learning  style 
and  in  modes  of  receiving  and  defining  cred- 


ible data,  and  responsible  for  obtaining  needed 
data  directly  rather  than  through  delegation. 

The  manager  in  a learning  system  can  del- 
egate the  actual  gathering  of  information  and 
data,  but  he  or  she  cannot  delegate  the  re- 
sponsibility for  defining  and  comprehending 
v^at  the  data  are  and  what  constitutes  rele- 
vent  information  when  those  definitions  and 
descriptions  have  previously  been  left  implicit, 
intuitive,  or  vague.  Ironically,  agency  man- 
agers who  most  need  to  increase  their  open- 
ness, sensitivity  to  differences  in  learning 
style,  and  willingness  to  pursue  actively  the 
information  they  need  are  those  least  likely  to 
recognize  or  attempt  to  remedy  their  defi- 
ciencies. The  practical  objectives  of  this 
chapter  are  to  make  social  program  managers 
and  program  evaliiators  conifortable  with  ad- 
dressing their  own  learning  styles  and  to  pro- 
vide broad  guidance  on  how  to  adapt  to  data- 
based  performance  monitoring. 

This  author  has  had  mziny  opportunities  to 
compare  the  wa5rs  in  which  business  firms  and 
publicly  funded  planning  and  human  services 
agencies  use  quantitative  data  and  other  fac- 
tual information  in  management.  Although 
business  firms  generally  are  more  sophisti- 
cated than  public  agencies  about  selectmg, 
monitoring,  reporting,  and  using  factual  in- 
formation to  reveal  operating  conditions  and 
trends,  business  firms,  like  social  programs, 
ejcperience  difficulties  in  agreeing  on  trust- 
worthy indicators  and  in  getting  them  used 
consistently.  The  major  difference  between 
the  two  types  of  organization  consists  in  how 
much  the  information  system  emphasizes 
protection  for  the  program  manager  against 
other  groups  such  as  clients  or  funding 
sources.  Managers  of  both  types  of  organi- 
zations intend  to  make  their  organizations 
responsive  to  their  control.  Managers  of  public 
agencies,  however,  obtain  much  of  their 
formal  information  and  most  of  their  quan- 
titative information  from  personnel  records. 


57 


accounting  reports,  purchase  of  services  con- 
tract documents,  or  grants  management  re- 
ports designed  or  shaped  by  outside  require- 
ments (e.g.,  protecting  taxpayers  from  theft, 
fraud,  and  misappropriation).  Because  the 
agency  manager  is  among  the  persons  least 
vulnerable  to  punishment  and  most  able  to 
turn  things  to  his  or  her  own  advantage,  it  is 
not  surprising  that  these  systems  are  used  in  a 
manner  that  protects  the  agency  manager 
from  punishment.  Accordingly,  there  is  less 
opportunity  for,  and  less  emphzisis  on,  pro- 
viding the  agency  service  delivery  manager 
than  the  private  sector  manager  with  the 
mezms  to  manage  for  productivity  zind  oper- 
ating results. 

Managers  in  businesses  that  set  objectives 
are  vulnerable  when  they  fail  to  produce 
planned  operating  results.  Accordingly,  busi- 
ness mzmagers  want  to  understand  how  their 
actions  affect  the  numbers,  and  they  want  to 
be  reasonably  sure  that  series  of  numbers  have 
consistent  meaning  over  time.  Because  there 
is  often  controversy  or  indecision  rather  than 
consensus  on  what  operating  results  are  really 
expected  from  public  programs,  public  pro- 
gram managers  are  trzulitionally  allowed  to  be 
less  vulnerable  to  the  consequences  of  failure 
to  achieve  planned  operating  results.  This 
situation  may  be  changing  toward  greater  em- 
phasis on  operational  productivity  and  hence 
also  greater  utility  for  performance  monitor- 
ing in  public  programs. 

Management  still  seems  to  be  more  an  art 
than  a science,  although  much  detailed  re- 
search into  managerial  behavior  has  taken 
place  over  the  past  15  years.  Some  of  this 
work  makes  it  seem  considerably  more  le- 
gitimate than  it  has  been  in  the  past  to  com- 
pare public  and  private  sector  managers,  as 
evidence  exists  of  similarities  among  a variety 
of  types  and  levels  of  managerial  jobs 
(Mintzberg  1973).  Mintzberg  and  others  find 
that  managerial  jobs  tend  to  be  highly  frag- 
mented, that  mzinagers  characteristically  rely 
heavily  on  face-to-face  oral  communications 
or  direct  observations,  and  that  a great  deal  of 
managerial  activity  is  at  once  too  specific  and 
too  complex  to  be  easily  fitted  into  the  classic 
management  categories  (planning,  orgzinizing, 
coordinating,  and  the  like).  The  concept  of 
social  learning  (Bandura  1977)  also  is  being 
applied  to  managerial  behavior,  thereby  en- 
riching and  demonstrating  in  human  action 
some  of  the  assumptions  of  cybernetics,  so- 
ciotechnical  systems  analj^is,  and  behavioral 
psychology  (Davis  and  Luthans  1980)  that  are 
used  in  this  chapter. 


Public  Sector  Managers’  Understanding  of 
Data-Based  Monitoring 

Social  program  managers  are  to  some  de- 
gree and  in  an  important  sense  fated  to  be 
inherently  incapable  of  ’Toiowing"  the  poten- 
tial of  data-bcised  monitoring  systems.  Locally 
designed  performzince  mezisurement  systems 
reflect  the  information  gathering  and  pro- 
cessing proclivities  of  the  managers  who  order 
their  development  and  who  hope  to  use  the 
information  produced.  On  the  other  hand, 
performance  measurement  systems  designed 
by  information  system  experts  zind  imposed  on 
an  agency  by  a higher  administrative  level 
reflect  the  designers'  proclivities.  Until  at 
lezist  a few  examples  of  a performance  moni- 
toring system  have  been  completed,  tested, 
and  used  long  enough  to  become  accustomed 
management  tools,  no  one  can  literally  know 
how  much  or  how  little  value  the  system  will 
have. 

It  may  be  less  obvious  that  the  content  and 
functioning  of  the  system  are  also  literally 
unknowable  in  ztdvance  of  completing  the 
system.  The  lack  of  broadly  comprehensible 
detail  in  system  specifications  in  zidvance  of 
system  construction  permits  bizises  and  pro- 
clivities relatively  free  rein.  While  these  at- 
titudes can  be  communicated  and  thereby 
known,  their  implications  for  a monitoring 
system  have  been  both  vague  and  ambiguous, 
although  nonetheless  potent.  One  of  the  ex- 
citing developments  that  have  begun  to  sur- 
face in  the  past  few  years  is  the  possibility  of 
describing  these  previously  inaccessible  (siib- 
jective)  proclivities  of  managers  and  design- 
ers, and  compensating  for  differences,  chang- 
ing system  parameters,  or  matching  the  pro- 
clivities of  users  and  designers  better. 

Managers,  to  whom  performzmce  monitoring 
systems  are  supposed  to  offer  support,  do  not 
feel  much  need  to  explain  themselves  and  the 
foundations  they  intuitively  recognize  beneath 
a valid  information  system  to  a system  de- 
signer. Designers  who  pride  themselves  on 
rationality  and  value  verbal  explicitness  may 
explain  too  much.  Although  Cvetkovich  (1981) 
found  that  a feeling  of  social  responsibility  or 
accountability  resulted  in  increased  rational 
verbal  explanation  to  those  to  whom  ac- 
countability is  felt,  the  relative  weakness  of 
such  tendencies  leaves  neither  manager  nor 
designer  in  a good  spot  to  facilitate  the  work 
of  the  other.  We  report  here  some  insights  and 
tools  that  can  be  used  to  minimize  or  over- 
come this  predicament. 

Two  common  but  radically  different  sensory 


58 


approaches  to  data-based  monitoring  systems 
are  of  interest— that  of  the  ’Tdnesthetic" 
manager  and  that  of  the  "visualizer."  Although 
only  these  two  examples  are  discussed  in  this 
chapter,  an  almost  infinite  number  of  dif- 
ferent possibilities  actually  exists  in  the  ty- 
pology from  which  these  examples  are  drawn 
(Dilts  et  al.  1980).  One  thesis  of  this  chapter  is 
that  the  data- processing  strategy  of  the  mon- 
itoring-system-creating manager  (a  strategy 
that  can  be  inferred  from  observable  behavior 
patterns)  and  the  zissumptions  behind  the  data- 
based  monitoring  S3rstem  need  to  become 
compatible  if  the  manager  is  to  make  effec- 
tive use  of  the  system.  Another  thesis  is  that 
managers  and  designers,  when  conscious  of  the 
need  for  compatibility  and  the  sensory  strat- 
egies that  define  what  compatibility  requires, 
have  gained  the  opportunity  to  overcome  the 
communications  barrier  between  the  manager 
(who  is  not  accountable  to  the  designer)  and 
the  designer  (who  in  some  sense  is  accountable 
to  the  manager  or  to  other  managers). 

Individual  managers,  including  agency  di- 
rectors, have  deeply  ingrained,  highly  indi- 
vidualistic approaches  to  data  definition,  data 
selection,  data  manipulation,  and  data  utili- 
zation. The  approaches  that  managers  use 
have  not  necessarily  been  the  object  of  their 
own  introspection  and  indeed  may  be  beyond 
their  conscious  awareness.  Nevertheless,  dif- 
ferences in  approach  can  exist,  can  be  noticed, 
and  can  be  coped  with  or  otherwise  used  for 
guidance.  Sproull  and  Larkey  (1979)  have  dis- 
cussed the  need  for  the  evaluator  to  match  his 
or  her  approach  with  that  of  the  manager.  The 
idiosyncreisies  of  approach  that  appear  to  get 
in  the  way  of  communication  between  evalu- 
ator and  manager  include  cognitive  or  logical 
errors,  such  as  biased  interpretations  bzised  on 
hindsight,  halo  effects,  overgeneralization 
from  small  samples  or  from  the  most  recently 
experienced  events,  and  attributing  too  little 
si^ificance  to  very  Isirge  samples. 

Data  for  decisionmaking  also  can  be  men- 
tally processed  as  though  the  only  data 
available  were  visual,  kinesthetic,  or  mediated 
through  some  other  sense,  and  the  only  pos- 
sible processing  modes  were  the  corresponding 
senses  (i.e.,  through  visual  symbols,  kines- 
thetic ”feel,"  etc.)  A practical  point  to  be 
made  is  that  the  kinesthetic  approach  actually 
tends  to  be  hostile  to  setting  up  formal  per- 
formance monitoring  systems.  The  visualizer's 
approach,  on  the  other  hand,  is  liable  to  mis- 
leading 2issumptions  that  exaggerate  the  eetse 
of  developing  a satisfactory  monitoring  system 
and  will  thus  lezid  to  other  problems. 


Many  individuals  use  only  one  of  their  sen- 
sory systems  in  making  decisions  under  con- 
ditions of  stress,  uncertainty,  and  inadequate 
information.  The  naturally  best  decision- 
makers, however,  process  data  sequentially,  in 
a series  of  different  sensory  modes  or  vocab- 
ularies. They  might  first  size  up  the  situation 
(kinesthetic).  Next  they  talk  to  themselves 
about  similar  situations  they  have  been  in. 
Then  they  see  if  they  get  any  good  ideas  about 
what  to  do  (visual).  Finally,  they  choose  the 
option  that  looks  the  best  (visual),  but  before 
they  act  they  check  whether  they  feel  com- 
fortable about  what  they  are  about  to  do 
(kinesthetic).  The  work  that  supports  my  belief 
that  relatively  unskillful  decisionmakers  con- 
duct their  decision  processes  primarily  in  the 
vocabulary  of  only  one  (or  occasionally  two) 
sensory  modes  is  described  in  Neurolinguistic 
Programming  (Dilts  et  al.  1980).  Although  this 
model  is  beised  on  observation  rather  than  ex- 
periment, Dilts  et  al.  found  certain  behavior 
patterns  such  as  characteristic  eye  movements 
consistently  accompanied  statements  about 
particulzir  sensory  data  processing  modes. 

Kinesthetic  Managers 

For  the  kinesthetic  manager  (keep  in  touch, 
feel  your  way),  monitoring  is  a response  to 
trouble  rather  than  a method  for  managing— it 
reacts  to  failure  or  difficulty  by  trying  to 
prevent  a repetition  of  the  troublesome  oc- 
currence. If  a system  evolves  from  this  kind  of 
monitoring,  the  system  is  a mere  consequence 
of  the  manager's  need  to  defend  himself  or 
herself,  rather  than  a means  toward  a positive 
objective.  In  other  words,  managers  who  feel 
their  way  tend  not  to  expend  their  energies  on 
developing  a monitoring  system  nor  will  they 
allow  anyone  else  to  challenge  their  preroga- 
tives by  initiating  such  a system.  The  are 
likely  to  assert  that  the  system  "wUl  just  turn 
out  to  be  a waste  of  time"  and  thus  are  in- 
clined to  oppose  and  eventually  to  stop  the 
process  of  system  development. 

The  inductive  and  empirical  approach  of  the 
kinesthetic  manager  can,  however,  result  in 
developing  a version  of  informal  and  neces- 
sarily subjective  performance  monitoring  that 
both  serves  a purpose  and  minimizes  overhead. 
Kinesthetic-derived  monitoring  systems  de- 
pend on  direct  contact  between  data  and 
users,  requiring  that  management  be  directly 
in  touch  with  key  subordinates  and  with  pro- 
gram operations.  The  kind  of  monitoring  ssrs- 
tem  that  evolves  in  response  to  a kinesthetic 
manager  therefore  requires  that  the  manage- 


59 


ment  team  (including  the  top  manager)  be 
pairticularly  zidept  and  sensitive  in  person- 
to-person  gathering  and  processing  of  infor- 
mation. In  fact,  in  a system  dominated  by  a 
kinesthetic  manager,  the  managerial  team  is 
its  own  monitoring  S3rstem. 

System  limitations  include  the  consequences 
of  uncontrollable  variability,  stringent  con- 
straints on  formal  reporting  capacity,  tenuous 
credibility  to  persons  outside  the  system,  and 
inherent  difficulties  in  pzissing  information 
upward  if  several  layers  of  management  or 
funding  officials  exist  above  the  principal 
kinesthetic  manager.  Furthermore,  time  series 
tend  not  to  be  reported  on  the  same  b2isis  in 
successive  time  periods,  and  data  are  filtered 
by  standards  that  both  shift  and  are  partly 
beyond  the  awareness  of  the  real  monitors,  the 
managers  themselves.  Therefore,  reports  sent 
to  outsiders  are  often  confusing  v^en  com- 
pared with  earlier  editions  of  allegedly  similzir 
reports.  Capacity  for  data  gathering  is  large, 
but  processing  eliminates  90  percent  of  the 
data  because  they  just  are  not  noticed,  even 
though  they  have  been  gathered. 

As  for  those  who  are  zissigned  by  their  su- 
periors to  be  living  monitors— performzince 
data  reviewers,  processors,  and  reporters— 
they  actually  serve  kinesthetic  management  as 
agents  of  operating  control,  rather  than  as 
staff  for  information  selection,  processing, 
and  reporting.  These  living  monitors  are  media 
for  management  messages  about  control, 
punishment,  performance  standards,  or  keep- 
ing busy.  They  do  not  function  as  the  objec- 
tified eyes  and  ears  for  management,  but  as 
symbols  of  the  presence  of  active  management 
power,  and  they  would  be  described  by  the 
kinesthetic  manager  not  as  monitors  (a  passive 
role)  but  as  messengers  and  as  means  for  get- 
ting the  job  done.  These  agents  of  manage- 
ment cannot  function  as  neutral  eyes  and  ears 
(data- gathering  monitors),  because  they  eire 
not  allowed  to  do  so  by  other  staff.  Instead, 
they  are  recognized  by  persons  at  all  levels  of 
the  institution  zis  "finks’ -inherently  un- 
reliable as  reporters,  because  they  are  sub- 
jected to  corrupting  (or  at  least  policy-lziden) 
influences  to  v^ich  they  succumb  at  reason- 
able opportunities  because  that  is  their  job.  So 
they  are  fed  data  selectively,  thereby  robbed 
of  the  opportunity  to  be  objective,  and  also 
become  the  proximate  cause  of  ever  more 
violence  committed  by  the  kinesthetic  man- 
ager on  organizational  subordinates  when  he  or 
she  senses  that  "real”  performance  data  are 
not  forthcoming. 

The  prevailing  organizational  myth  and  part 


of  the  governing  context  in  such  a situation  is 
that  "everything  is  political."  There  is  and  can 
be  no  small  talk,  no  neutral  information  ex- 
change, no  abstract  conversation.  Objective 
monitoring  systems  are  therefore  unlikely  to 
be  brought  into  existence  in  this  mode.  Man- 
agement itself  vdll  politicize  every  act  and 
process,  including  its  own  self- created  official 
monitors- -and  thus  management  will  itself 
ensure  that  any  monitoring  function  becomes 
purely  political,  whether  embodied  in  persons, 
machines,  or  sociotechnical  systems.  The 
managers  themselves  think  they  know  pre- 
cisely what  they  have  in  the  "information 
system"  (namely,  a number  of  political  reeds 
on  which  to  lean).  They  simply  have  no  faith  in 
what  they  might  get  from  a statistical  infor- 
mation s3AStem  that  does  not  already  exist  in 
palpable  form,  and  they  are  usually  able  to 
keep  themselves  beyond  the  reach  of  any 
evaluator  who  tries  to  move  them  toward  what 
the  evaluator  would  proclaim  to  be  a more 
objective  view. 

A determined  proponent  of  performance 
monitoring  could  promote  and  ultimately  build 
up  a monitoring  system  in  a agency  led  by 
kinesthetic  managers.  Such  a system  will  be 
tolerated,  however,  only  if  it  sends  the  right 
messages  to  the  staff,  because  its  actual 
function  is  to  affect  the  staff  rather  than  to 
inform  the  manager.  The  information  it 
gathers,  processes,  and  duly  lays  before  the 
manager  is  likely  to  be  disregarded  ("I  never 
have  time  to  look  at  the  reports").  In  other 
words,  the  kinesthetic  manager  is  not  only 
likely  to  develop  his  or  her  own  systems  and  to 
think  of  them  primarily  as  useful  reminders  of 
management  authority  to  subordinates;  this 
person  also  is  likely  to  think  of  systems  sug- 
gested by  others  in  the  same  way.  Since  these 
managers  simply  will  not  think  of  monitoring 
systems  merely  as  sources  of  information, 
progress  in  improving  their  use  of  systematic, 
aggregated  performance  measuring  is  slow. 

Performance  monitoring  as  actually  con- 
ducted by  kinesthetic  managers  can  be  sur- 
prisingly elegant,  despite  its  seeming  lack  of 
sophistication  in  qviantitative  terms,  because 
these  managers  eaisily  cut  through  masses  of 
conflicting  information.  They  are  in  little 
doubt  about  how  to  weigh  incompatible  ele- 
ments and  come  to  balanced  conclusion-  they 
know  without  formal  calcxilation  what  "feels 
solid."  By  the  same  token,  however,  their  re- 
sponses to  much  of  the  information  from  mon- 
itoring systems  will  be  akin  to  that  of  the 
State  legislators  found  by  Feller  et  al.  (1979) 
to  regard  all  of  their  "scientific"  information 


60 


sources  as  bizised  and  politicized  by  self- 
interests. 

Monitoring  S3^tems  natural  to  kinesthetic 
managers  are  characterized  by  open-ended 
data  bases.  Where  politicization  of  informa- 
tion and  "what  feels  right”  are  principal  cri- 
teria for  judging  the  utility  and  quality  of  in- 
formation for  decisionmaking,  information 
that  is  acceptable  or  useful  will  vary  greatly 
from  issue  to  issue  and  from  time  to  time.  The 
result  is  to  keep  the  relevant  data  beise  open 
ended  and  susceptible  to  quick  shifts  in  def- 
inition, criteria,  standards,  indicators,  meas- 
ures, and  content.  This  variability  persists 
even  when  program  objectives  are  kept  ap- 
parently constant.  Established  agency  bottom- 
line  meeisures  of  control  or  progress,  such  as 
budgets  or  head  counts,  however,  are  found 
acceptable  for  both  control  and  measurement 
by  even  the  most  consistently  kinesthetic 
managers. 

In  summary,  politicization  of  information 
and  the  requirement  that  the  data  base  and 
the  interpretive  apparatus  be  open  to  instant 
revisions  and  additions  tend  to  make  the 
agency  dominated  by  kinesthetic  management 
a relatively  poor  place  to  develop  a formal 
s3Tstem  of  data-bzised  performance  monitoring. 

Visual  Managers 

At  the  opposite  extreme  are  the  managers 
wdio  process  data  primarily  in  terms  defined  by 
their  visual  apparatus  ("As  I see  it”;  "In  my 
view**).  These  individuals'  system  requirements 
are  defined  by  their  vision  of  what  they  have 
seen  or  want  to  see,  rather  than  by  their  touch 
or  feeling  for  what  is  going  on.  It  is  relatively 
easy  for  visualizers  to  construct  whatever 
vision  they  want  even  though  they  may  com- 
press, distort,  or  otherwise  modify  reality  in 
ways  that  deceive  them  about  how  ezisy  it 
would  be  to  devise  a S3^tem  that  would  ac- 
quire precisely  the  information  they  need  ("I 
can  see  it  in  my  mind's  eye”). 

Whereas  kinesthetic  managers  doubt  that  a 
monitoring  system  modeled  on  and  abstracted 
from  reality  can  ever  accurately  portray  what 
is  going  on,  visualizers  imagine  that  it  is  easy 
to  think  up  a good  system  that  will  show  them 
just  what  they  need  to  see.  But  the  visualizer's 
casually  convincing  ideas  for  getting  really 
useful  information  usually  turn  out  poorly. 
Building  the  data  system  takes  more  work  and 
results  in  less  reliability  and  utility  than  was 
so  rezidily  visualized  in  advance.  At  the  point 
of  depressing  discovery  of  the  system's  limi- 


tations, the  visualizer-manager  loses  interest 
in  the  system  or  blames  the  system  designer 
for  failure.  Worse  still,  the  visualizer  may  in 
an  embarrassment  of  misplaced  egotism  say  he 
or  she  wants  the  system  implemented,  even 
though  experience  already  has  demonstrated 
its  futility.  The  visualizer  wzistes  resources, 
may  convey  a sense  of  false  security  to  sub- 
ordinates, and  imparts  a sense  of  Ms  or  her 
own  incompetence  to  others— who  sense  that 
the  system  can  never  accomplish  the  moni- 
torial goals  of  the  visualizer-manager. 

Neither  the  extremely  kinesthetic  nor  the 
extremely  visual  manager  can  individually 
develop  a usable  monitoring  system,  because 
they  are  unable  either  to  recognize  or  to  ac- 
knowledge a satisfactory  monitoring  system. 
Perhaps  10  to  30  percent  of  managers  who 
direct  large  social  programs  literally  cannot 
know  (cognitively)  what  they  can  anticipate  in 
or  derive  from  monitoring  systems.  It  takes  an 
adroit,  persistent,  and  discerning  evaluator  to 
provide  to  such  managers  the  experience  of 
useful  monitoring,  through  which  they  can 
most  readily  develop  pragmatic  appreciation 
and  understanding  of  what  monitoring  can  do 
for  them. 


Guidance  for  Managers  and  Systems  Designers 

Although  most  agency  and  social  program 
managers  use  more  sophisticated  decision 
strategies,  only  about  10  to  30  percent  can 
readily  accept  a new  data-based  performance 
monitoring  system  and  can  facilitate  its  de- 
velopment significantly.  At  a minimum,  a 
manager  needs  to  be  candid  about  whether  he 
or  she  can  really  accept  any  monitoring  sys- 
tem and  whether  it  is  reasonable  to  expect  to 
learn  to  use  a monitoring  system  in  advance  of 
having  one.  It  makes  sense  to  start  slowly, 
move  cautiously,  keep  it  simple,  and  elicit 
frequent  feedback  from  the  system  designer, 
especially  with  respect  to  the  items  that  de- 
signers 2ind  evaluators  regard  as  most  obvious. 
Not  every  manager  believes  what  numbers  say, 
but  many  evaluators  and  sjrstem  designers  do. 
It  can  be  tricky  for  a manager  to  cope  with 
what  he  or  she  may  regard  zis  the  evaluator's 
naive  faith  in  abstraction  and  quantitative 
absolutism.  The  manager  wants  the  monitoring 
system  to  make  him  or  her  (1)  to  be  zmd  appeeir 
to  be  an  active,  informed,  and  determined 
problem  solver;  (2)  to  be  equipped  to  defend 
the  program;  and  (3)  to  be  sure  ^at  operations 


61 


are  within  tolerable  limits,  all  at  a monitoring 
cost  that  is  reasonable. 

None  of  these  potential  benefits  automati- 
cally leads  to  the  use  of  monitoring  in  man- 
agement so  long  as  the  manager  is  sufficiently 
skeptical.  Assuming  that  a data  system  is  a 
good  idea  in  the  first  place,  being  able  to 
distinguish  those  managers  with  relatively 
extreme  approaches  to  their  own  data  pro- 
cessing is  useful.  These  managers  have  the 
most  difficulty  in  accepting  newly  developed 
monitoring  systems  and  tend  to  reject  many 
systems  even  when  these  sj^tems  are  capable 
of  providing  useful  information.  This  mana- 
gerial avoidance  is  worth  reducing,  so  it  is 
valuable  to  be  able  to  predict  its  occurrence. 

Such  managers  have  grossly  unreaisonable 
expectations  about  formal  information  sys- 
tems; they  reveal  in  their  conversations  that 
they  expect  far  too  much,  far  too  little,  or  the 
wrong  things  from  such  systems.  They  also 
tend  to  keep  themselves  perpetually  ignorant 
about  the  process  of  system  development— in 
particular,  maintaining  aloofness  or  mis- 
understanding with  respect  to  their  own  nec- 
essary role  in  the  crucial  work  of  defining  the 
functional  specifications  for  the  system.  Last, 
they  exhibit  strong  preference  for  a single- 
step  mode  of  processing  sensory  data. 

Several  points  are  apparent  here:  (1)  Man- 
agers for  whom  performance  monitoring  sys- 
tems are  particularly  difficult  to  develop 
constitute  more  than  one  type.  (2)  Some  very 
good  managers  are  not  easy  to  work  with  to 
develop  performance  monitoring  sjrstems.  (3) 
Both  managers  and  S3^tem  designers  can  use- 
fully follow  clues  toward  developing  the  high 
level  of  communications  rapport  necessary  for 
a successful  monitoring  system.  (4)  Managers 
who  are  adaptable  and  amenable  to  using 
data-based  performance  monitoring  s3rstems 
well  employ  decision  strategies  that  embrace 
several  sensory  modes.  It  may  be  that  anyone 
can  operate  or  use  a performance  monitoring 
sjrstem,  but  such  systems  are  necessarily  de- 
veloped for  and  by  someone  in  particular,  and 
those  persons  and  the  system  designers  simply 
must  come  to  terms. 

Once  system  development  gets  to  the  point 
at  which  genuinely  useful  information  is 
reaching  the  manager,  distinctions  concerning 
sensory  system  strategies  become  less  im- 
portant. The  designer  and  the  user  now  have  at 
least  a few  good  examples  and  instances  of 
genuine  communication  contacts.  It  will 
thereafter  be  easier  to  decide  if  they  are  on 
the  right  track  and  making  progress. 


Requirements  for  Practical  Performance 
Monitoring  Systems 

Social  program  managers'  expressed  needs 
for  information  are,  pzu-adoxically,  simpler 
than  most  evaluation  methodologies  will  sup- 
port. They  include: 

• Anecdotes  that  suggest  the  best  aspects 
of  the  program,  to  offer  in  justification  of 
continued  support 

• Examples  of  worst  cases  zmd  allegedly 
t3rpical  caises  against  which  to  sharpen  up 
policy 

• Specific  descriptions  of  successful  pro- 
grams that  middle  managers  and  local 
leeidership  can  learn  from 

• A believable  way  to  describe  what  the 
program  is  and  does  and  how  much  of 
each  activity  is  going  on 

These  needs  are  interesting  in  that  none  of 
them  says  a word  about: 

• Managers’  skills  and  the  use  of  these  skills 

• Changes  one  might  make  to  improve 
program  management 

• Managers'  need  for  feedback  to  permit 
them  to  manage  the  program  better 

The  real  strength  of  data-baised  monitoring 
hjis  always  been  the  feedback  it  can  give  to  a 
manager  or  funding  agency  interested  in  using 
information  to  improve  management.  Thus, 
what  managers  actually  want  is  something  of  a 
disappointment.  What  I as  a management 
constant  would  like  to  sell  is  management 
self-improvement;  what  I find  that  managers 
with  money  will  usually  buy  is  program  jus- 
tification and  other  information  or  procedures 
that  will  satisfy  outside  inqijisitioners.  Social 
programs  stay  in  business  if  they  get  money  to 
support  themselves.  Accordingly,  social  pro- 
gram managers  will  buy  information  systems  if 
they  need  them  to  provide  eidequate  audit 
trails  or  billing  data  required  by  public  or 
other  third-party  financial  sources.  Some  so- 
cial program  managers  even  in  these  strait- 
ened times  continue  to  explore  the  advantages 
of  automated  information  systems  to  provide 
data  dumps  for  researchers  and  planners. 

The  time  is  still  to  arrive  when  social  pro- 


62 


gram  managers  clamor  for  monitoring  systems 
capable  of  helping  them  to  manage  better.  My 
experience  in  contract  research  and  consulting 
suggests  that  a program  manager  has  to  be- 
come very  comfortable  with  an  evaluator, 
deeply  involved  with  his  or  her  own  respon- 
sibilities for  getting  some  utility  out  of  the 
monitoring  system,  zind  persuaded  of  the  man- 
agerial competence  of  the  evaluator  before 
authorizing  use  of  a monitoring  system  as  a 
vehicle  for  program  improvement  through 
management  action.  This  is  true  even  when 
the  evaluator  is  the  manager's  own  staff.  It  is 
even  more  true  when  the  evaluator  represents 
some  higher  level  of  the  overall  program  of 
which  the  agency  is  a part. 

Evaluators  make  the  following  kinds  of  re- 
marks about  their  managers  in  problematic 
situations:  (1)  Managers  do  not  have  proper 
respect  for  data,  (2)  Managers  do  not  under- 
stand systems,  (3)  Managers  are  slow  to  make 
decisions.  Managers  in  turn  make  equally 
stylized  remarks  about  evaluators:  (1)  Evalua- 
tors too  often  get  lost  in  methodology  and 
mired  in  unimportant  issues,  (2)  Evaluators 
rarely  produce  good  enough  answers  soon 
enough  to  be  useful,  (3)  Evaluators  exaggerate 
the  utility  and  applicability  of  their  methods 
for  eissisting  in  decisions.  When  these  stereo- 
types remain  strong  enough  to  obscure  com- 
munications between  mzinager  and  evaluator, 
it  is  unlikely  that  managers  will  entrust  the 
evaluator  with  responsibilities  for  manage- 
ment improvement  through  monitoring  sys- 
tems, even  if  the  mzinager  wants  such  im- 
provement. Generally,  managers  seem  to  want 
nothing  of  the  kind.  Some  typical  attitudes  are 
the  following: 

• Program  improvement  through  perform- 
ance monitoring  is  not  a high  priority  ob- 
jective for  the  manager.  If  a manager— 
particularly  an  executive  director— is 
giving  high  priority  to  acquiring  funds 
from  third  peirties,  he  or  she  almost  has  to 
believe  that  what  management  can  do  to 
ensure  good  results  is  alreeuiy  being  done 
or  in  good  faith  could  not  continue  to 
vouch  for  the  funds  being  requested. 

• Program  improvement  is  not  viewed  as  a 
potential  outcome  of  using  a formed  per- 
formance monitoring  system. 

• Program  improvement  is  viewed  as  within 
the  scope  of  the  manager's  responsi- 
bilities only  insofeu*  as  he  or  she  needs  to 
obtain  resources.  Agencies  that  provide 


professional  services  operate  collegially. 
Professional/clinical  considerations  con- 
strain the  manager.  Respect  for  profes- 
sional expertise  means  that  the  manager 
believes  that  management  control  is 
harmfid  to  good  work  by  the  professional 
staff— or  he  or  she  believes  that  mcma- 
gerial  initiatives  should  proceed  from  a 
level  other  them  the  manager's. 

• Managers  are  unaware  of  the  gap  between 
what  they  actiially  know  and  what  they 
would  benefit  from  knowing  and  can  find 
out  from  specific  information- gathering 
activities. 

• Performance  monitoring  is  perceived  by 
the  agency  manager  as  one  more  dis- 
traction, a yielding  to  unproven  claims  by 
program  evaluators. 

• Performance  monitoring  is  seen  by  the 
agency  manager  as  an  unwanted  en- 
croachment on  management. 

Meuiagerial  problems  tend  to  be  perceived 
by  managers  sis  very  specific,  very  short  term, 
and  subject  only  to  incremental  adjustment 
(Sproull  and  Larkey  1979).  Performance  moni- 
toring systems  satisfy  these  management  re- 
quirements when  they  mesisure  progress 
toward  objectives  that  managers  regard  as 
important,  provide  rapid  feedback  (or  present 
data  that  anticipate  managerial  questions), 
and  offer  information  definite  enough  to  be 
useful.  The  evaluability  sissessment  principles 
espoused  by  The  Urban  Institute  staff  (Wholey 
1976;  Schmidt  et  al.  1978)  are  useful  in  making 
performance  monitoring  potentially  credible 
to  managers  at  local  and  other  levels  of  pro- 
gram management,  because  they  are  obviously 
sensible.  They  imply  reasons  for  not  going  too 
fast  and  provide  a rationale  for  relating  a 
monitoring  S5rstem  into  an  overall  management 
process:  Evaluate  programs  only  when  they 
have  developed  far  enough  to  make  *evaluation 
sensible,  sa5AS  The  Urban  Institute.  Evaluability 
requires  that  meeisurable,  time-bound  objec- 
tives are  set  for  or  by  the  program,  that 
quantitative  indicators  are  available,  that 
mezisures  are  decided  on,  that  appropriate 
data  and  a data- gathering  system  are  in  place, 
and  that  management  participates  in  the  de- 
cisions. In  a situation  that  meets  these  re- 
quirements, rapid  feedback  should,  it  is  hoped, 
also  be  possible. 

Experience  of  Arthur  D.  Little,  Inc.,  (1981)  in 
helping  the  Bureau  of  Health  Planning  to  im- 


63 


plement  a rapid  feedback  system  based  on 
earlier  work  of  The  Urban  Institute  in  national 
health  planning  program  reveals  some  zuldi- 
tional  requirements  for  actually  achieving 
rapid  feedback.  Substantial  progress  was  made 
in  setting  up  a special-purpose  program  per- 
formance monitoring  system  to  provide  rapid 
specific  answers  to  specific  questions,  and  to 
help  resolve  current  issues  of  managerial  and 
policy  significance. 

This  turned  out  to  require  multiple  cycles  of 
questioning,  or  providing  an  improved  context 
for  questioning,  in  order  simultaneously  to 
provide  good  information  and  to  meet  the 
deadlines.  This  incremental  approach  to  the 
rapid  feedback  performance  monitoring  sys- 
tem seemed  to  be  provisionally  vindicated,  if 
managers  could  accept  the  need  to  identify 
and  grapple  with  newly  noticed  gaps  in  in- 
formation at  nearly  every  step  in  the  process. 
Sufficient  progress  kept  hopes  alive  that  the 
information  to  be  gathered  would  be  specific 
enough,  current  enough,  and  well  enough  at- 
tuned to  operating  realities  to  permit  sensitive 
use. 

The  evaluability  eissessment  approach 
therefore  appears  to  be  an  important  devel- 
opment in  the  management  arts.  This  kind  of 
approach  is  among  those  needed  to  develop 
and  sustain  performance  monitoring  systems  in 
social  programs.  Rapid  feedback  based  on 
evaluability  assessment  can  provide  program 
justification  materials  and  summaries  on  the 
status  of  particular  activities  fast  enough  and 
well  enou^  to  survive  some  of  the  other  vul- 
nerabilites  in  monitoring  systems.  A rapid 
feedback  system  provides  one  useful  entry 
point  for  actually  trying  out  data-based  per- 
formance monitoring  as  a device  for  program 
improvement  through  managerial  learning. 

Management  Information  Systems  and 
Management  Self-Improvement 

Each  of  us  who  ever  recommended  erection 
of  a management  information  system  (MIS) 
and  then  had  to  live  through  its  development 
knows  how  poor  a route  this  is  to  instant 
gratification.  MIS  is  also  a poor  route  toward  a 
managerial  decision  to  use  data-based  moni- 
toring as  a tool  of  program  management.  The 
real  problem  is  to  find  indicators  that  the 
manager  trusts  enough  to  use,  especially  when 
the  program  is  farflung,  complex,  and  varied. 
In  other  words,  the  real  problem  is  still  the 
classic:  that  of  becoming  precise  about  what  is 
to  be  managed.  A simple  cyclical  progression 


works  here  as  well  as  with  evaluability  as- 
sessment and  comprises  (1)  initial  questions 
whose  answers  will  constitute  information 
needed  to  manage  more  effectively,  (2)  initial 
data  gathered  to  answer  the  questions  devel- 
oped in  step  1,  and  (3)changes  in  management 
behavior  intended  to  overcome  problems  re- 
vealed or  verified  by  the  data  just  gathered. 
Then  a new  cycle  commences,  with  better 
questions,  more  focused  answers,  and  more 
precisely  targeted  management  responses  that 
modify  the  situation  in  still  better  anticipated 
and  narrowly  confined  ways. 

This  sequence  suggests  a proper  beginning 
for  developing  a data-based  monitoring  sys- 
tem, rather  than  with  the  goals  of  the  program 
or  with  performance  indicators.  A budget- 
conscious manager  in  need  of  justifjdng  man- 
agement costs  (overhead)  to  skeptical  funders 
might  look  to  program  objectives  and  per- 
formance measures  to  find  out  if  they  pro- 
vided good  justification  for  spending  money  on 
a performance  monitoring  system.  The  pos- 
sibility of  arguing  that  potential  overall  sav- 
ings would  justify  system  cost  can  look  at- 
tractive. But  developing  the  system  incre- 
mentally, in  a way  that  quicldy  begins  to 
provide  answers  to  management's  important 
questions,  would  still  be  preferred. 

This  suggests  where  MIS  fits  into  a devel- 
opmental sequence  that  would  include:  (1)  a 
S3^tem  to  answer  specific  questions  of  im- 
portance to  management,  (2)  a process  for 
converting  the  experience  and  results  of  an- 
swering individually  important  questions  into  a 
first  approximation  to  performance  monitoring 
(e.g.,  a rapid  feedback  system),  (3)  study  of 
the  performance  monitoring  system  and  its  use 
in  order  to  zidjust  it  to  the  phenomena  actually 
being  managed,  and  (4)  development  and  pre- 
liminzuy  design  of  the  MIS. 

Starting  earlier  in  the  sequence  to  design 
the  MIS  usually  reflects  pressure  to  automate 
bookkeeping  or  related  business  functions,  to 
deal  with  overwhelming  volumes  of  financial 
and  nonfinancial  numbers,  or  to  improve  fi- 
nancial control.  These  rezisons  can  reflect 
compelling  pressures.  The  consequence,  how- 
ever, is  a system  in  which  management  in- 
formation is  likely  to  take  a back  seat  to  data 
for  day-to-day  bookkeeping,  billing,  and  cost 
accounting  if  it  exists.  The  practical  solution 
is  generally  to  develop  or  accept  automated 
bookkeeping  or  other  recordkeeping  systems  in 
order  to  achieve  the  significant  gains  they 
afford,  but  to  recognize  that  these  systems 
are  not  capable  of  development  into  perform- 
ance monitoring  systems  (even  when  labeled 


64 


MIS)  until  precision  is  achieved  on  clzissic 
management  issues.  These  include  the 
following: 

• Are  we  doing  and  accomplishing  what  we 
say  we  are  doing  as  an  organization? 

• Do  we  want  to  do  more  (or  less),  or  should 
we  change  what  we  say  we  want  to  do? 

• What  ways  are  open  to  us  to  permit  us  to 
do  more  (or  less)? 

• Is  current  output  worth  the  effort?  Is  the 
incremental  output  of  a change  in  ap- 
proach worth  the  effort? 

• Are  we  managing  those  aspects  of  or- 
ganizational behavior  that  make  sense  to 
manage? 

• How  much  additional  leverage  on  change 
can  we  obtain  by  doing  more  of  what  we 
have  been  doing  as  managers? 

• What  is  the  comparative  yield  of  potential 
management  actions  "a"  and  "b”  (one  of 
which  we  have  been  doing,  the  other  of 
which  is  either  new  or  different)? 

As  more  public  programs  begin  increasingly 
to  use  fourth  generation  computer  systems, 
with  the  great  increzises  in  speed,  flexibility, 
2ind  memory  eissociated  with  them,  new  pos- 
sibilities arise  for  performance  mezisurement 
systems  and  for  many  other  management  im- 
provements. In  fact,  the  user-directed  s5rstems 
that  become  possible  with  the  new  computer 
technology  constitute  the  first  time  in  this 
century  that  technology  is  driving  the  form 
and  content  of  organization.  Part  of  what  is 
going  on  is  the  relegation  of  routine  operations 
zmd  recordkeeping  to  the  computer.  By  the 
same  token,  however,  much  data  that  were 
previously  inaccessible  or  nonexistent  now  are 
being  recorded  in  orderly,  accessible  data 
beises.  These  data  are  often  in  convenient  form 
for  evaluators  to  work  with,  at  lezist  to  use  in 
prototyping  new  performance  measurement 
systems  that  can  be  more  fully  developed 
through  just  the  sort  of  iterations  argued  for 
in  earlier  sections  of  this  chapter. 


Avoiding  Measurement 


fact  that  management  has  fixed  on  an  indi- 
cator to  use  as  a measure  ensures  its  invalidity 
(Hulswit  1981).  In  social  programs,  with  nearly 
infinite  opportunities  for  subtle  changes  in 
client  selection,  transactional  intensity, 
therapeutic/educational/social  services  mix, 
or  case  management  mode,  subverting  one's 
real  purposes  to  meet  prescribed  measures  has 
been  a way  of  life  whenever  needed  to  meet 
funding  pressures  or  other  exigencies  of  pro- 
gram justification.  If  the  measure  is  job 
placements,  take  only  clients  who  can  be 
placed.  If  the  mezisure  is  change  in  SAT 
scores,  train  on  SAT  materials.  If  the  measure 
is  customer  satisfaction,  sample  repeat  cus- 
tomers and  do  not  pursue  dropouts.  Should  we 
despair  of  "measurement"?  One  alternative  to 
despair  is  belief  in  the  eventual  development 
of  a set  of  measures  sufficiently  interdepen- 
dent that  optimization  on  any  one  or  two 
criterion  values  will  make  the  others  worse. 

Conclusion 

This  chapter  hsis  outlined  several  different 
aspects  of  one  subject— beisic  managerial 
phenomena  that  constitute  barriers  to  per- 
formance monitoring  and  progress  in  dealing 
with  some  of  them.  Some  of  these  phenomena 
can  be  dealt  with  more  effectively  than  was 
the  case  as  recently  as  5 years  ago.  All  can  be 
dealt  with  better  when  they  become  the  sub 
ject  of  active  managerial  effort.  There  is  still 
a long  way  to  go,  but  there  is  also  a hopeful 
message  in  this  chapter.  That  message  is  a 
corollary  to  the  fact  that  directors  of  local 
and  more  broadly  beised  social  programs  are 
now  being  forced  to  become  active  managers. 
Social  program  managers  will  experience  in- 
tense pain  if  they  fail  to  become  effective  in 
managerial  roles  that  are  newly  intensified  by 
financial  pressure  and  the  directives  of  gov- 
ernance bodies.  The  pain  of  failure  in  devel- 
oping and  using  rezisonable  performance  mon- 
itoring systems  is  likely  to  be  more  intense  for 
managers  than  the  pain  of  confronting  the 
phenomena  outlined  in  this  chapter.  The  state 
of  the  management  arts  is  good  enough  to 
support  considerable  application  of  perform- 
ance monitoring  if  and  as  individual  managers 
want  to  face  up  to  the  difficulties  involved. 


References 


A contemporaiy  philosopher  has  restated  a 
useful  rule  about  in^cators  and  measures- -the 


65 


Arthur  D.  Little,  Inc.  Contract  report  to  the 
Health  Resources  Administration  on  the 
performance  evaluation  program.  J.  Hage- 


dom,  Project  Director,  Contract  No.  HRA- 
232-79-0120,  1981. 

Bandura,  A.  Social  Learning  Theory.  New 
York:  Prentice- Hall,  Inc.,  1977. 

Cvetkovich,  G.  Cognitive  accommodation, 
language,  and  social  responsibility.  Social 
Psychology  44(2):  149-155,  1981. 

Davis,  T.R.V.,  and  Luthans,  J.  Managers  in 
action:  A new  look  at  their  behavior  and 
operating  modes.  Organizational  Dynamics 
9(l):64-80,  1980. 

Dilts,  R.;  Grinder,  J.;  Handler,  R.;  Handler, 
L.C.;  and  Delozier,  J.  Neurolinguistic  Pro- 
gramming: Volume  1,  The  Study  of  the 
Structure  of  Subjective  Experience.  Cuper- 
tino, Calif.:  Meta  Publications,  1980. 

Feller,  I.;  King,  M.R.;  Menzel,  D.C.;  O'Connor, 
R.E.;  Wissel,  P.A.;  and  Ingersoll,  T.  Scien- 
tific and  technological  information  in  state 
legislatures.  American  Behavioral  Scientist 
22(3):417-436, 1979. 


Hulswit,  F.T.  Personal  communication  to 
author  by  Arthur  D.  Little,  Inc.,  colleague, 
1981. 

Mintzberg,  H.  The  Nature  of  Managerial  Work. 
New  York:  Harper  and  Row,  1973. 

Schmidt,  R.W.;  Scanlon,  J.W.;  and  Bell,  J.B. 
Evaludbility  Assessment:  Making  Public 

Programs  Work  Better.  Washington,  D.C.: 
The  Urban  Institute,  1978. 

Sproull,  L.,  and  Larkey,  P.  Managerial  be- 
havior and  evaluator  effectiveness.  In: 
Schulberg,  H.C.,  and  Jerrell,  J.M.,  eds.  The 
Evaluator  and  Management.  Beverly  Hills, 
California:  Sage,  1979.  pp.  89-104. 

Wholey,  J.S.  The  role  of  evaluation  and  the 
evaluator  in  improving  public  programs:  The 
bad  news,  the  good  news,  and  a bicentennial 
challenge.  Public  Administration  36:679- 
683,  1976. 


66 


Dysfunctional  Potentials  of  Performance  Measurement* 


Paijline  E.  Ginsberg 
Utica  College  of  Syracuse  University 


The  desire  to  base  action  upon  information 
resulting  from  formal  study  or  quantitative 
data  sjrstems  must  be  tempered  by  attention 
to  the  effects  of  the  information  gathering 
process  itself.  Failure  to  scrutinize  possible 
dysfunctional  side  effects  of  data  collection  in 
the  application  of  quantitative  indicators  to 
bureaucratic  decisionmaking  may  have  the 
unfortunate  side  effect  of  serving  bureaucracy 
at  the  cost  of  knowledge.  Early  concretization 
of  poorly  understood  constructs  (e.g.,  health, 
quality  of  life,  cost  effectiveness,  optimum 
level  of  functioning)  to  aid  instrumental  de- 
cisionmaking may  introduce  distortions  in  the 
data  which  make  more  difficult  the  achieve- 
ment of  the  ultimate  goals  of  understanding 
and  providing  services.  It  is  crucial  that  ef- 
forts to  demonstrate  the  worth  of  social  pro- 
grams do  not  unwittingly  undermine  the  very 
services  they  seek  to  support  and  equally 
crucial  that  the  need  to  justify  continued 
funding  neither  blinds  us  to  program  weak- 
nesses nor  inhibits  our  ability  to  search  for  the 
means  to  program  improvement. 

Application  of  principles  of  human  motiva- 
tion and  social  psychology  to  administrative 
settings  should  make  unwanted  side  effects 
predictable  and  enable  us  to  identify  those 
circumstances  under  which  feedback  effects 
from  bureaucratic  use  of  quantitative  indi- 
cators produce  more  harm  than  good,  but  this 
has  not  yet  been  achieved.  A brief  review  of 
the  literature  (Ginsberg  1982)  provides  con- 
vincing evidence  that  dysfunctional  side- 
effect  generation  is  pervasive.  It  further  dem- 
onstrates that  attempts  to  intervene  amd  cor- 
rect such  side  effects  typically  misperceive 
their  origins,  treat  the  symptom  rather  than 
the  cause  (Cyert  and  MacCrimmon  1968)  and 


♦This  is  a shortened  version  of  an  jirticle  in  Evcd- 
uation  and  Program  Planning  7:1-12,  1984.  The 
author  is  indebted  to  D.T.  Campbell  for  conver- 
sations that  developed  the  ideas  in  this  paper. 


hence,  exacerbate  problems  rather  than  solve 
them. 

Blau  (1955,  1956)  offered  an  explanation  for 
side-effect  generation  in  his  observation  that 
bureaucracies  have  dual  interdependent  formal 
and  informal  organizational  structures.  At- 
tempts at  formal  control  evoke  compensatory 
responses  in  the  informal  structure  that  gen- 
erally lead  to  modification  of  the  original 
control  mechanism.  Ridgeway  (1956)  noted 
that  where  formal  control  is  focused  upon 
performance  indicators,  these  side  effects  are 
likely  to  be  markedly  dysfimctional.  Indeed, 
such  was  found  in  Blau's  (1955)  observational 
study  of  a State  employment  office  where 
performance  indicators  used  in  staff  evalua- 
tion (e.g.,  number  of  interviews  conducted  and 
number  of  positions  filled)  produced  unex- 
pected side  effects  in  job  functioning. 

Similar  occurrences  have  been  observed  in 
business  (Berliner  1957;  Gouldner  1954; 
Granick  1954,  1972;  law  enforcement  (Gar- 
diner 1969;  McCleary  1977;  McCleary  et  al. 
1982;  Seidman  and  Couzens  1974;  Skolnick 
1966;  Zeisel  1971),  education  (Becker  1961, 
1968;  Bogdan  1976;  Cronbach  et  al.  1980; 
Stake  1971),  mental  retardation  (Taylor  and 
Bogdan  1980),  social  services  (Beck  1970; 
Lipsky  1980;  Prottas  1979),  Government  (Glsiss 
et  al.  1981;  Kutchinsky  1973:  Pressman  and 
WUdavsky  1973),  and  health  care  (Brown  1981; 
CovaleskL  and  Dirsmith  1981a,  b;  Office  of 
Technology  Assessment  1980;  Russell  1979). 
This  literature  hsus  not  been  sufficiently  sys- 
tematized to  allow  development  of  a compre- 
hensive theory  of  side-effect  generation  that 
can  be  used  for  applied  decisionmaking.  A 
glimpse  of  what  such  a theory  might  look  like 
hzis  been  provided  by  Campbell  (1979): 

The  more  any  quantitative  social  indi- 
cator Is  used  for  decisionmaking,  the 

more  subject  it  will  be  to  corruption 

pressures  and  the  more  apt  it  will  be  to 


67 


distort  and  corrupt  the  social  processes  it 
is  intended  to  monitor  (p.85). 

Ability  to  anticipate  the  side- effects  of 
particular  methods  of  performance  measure- 
ment, to  weigh  one  against  the  other,  to 
choose  implementation  of  the  leaist  d5rsfunc- 
tional,  and/or  to  control  the  extent  of  dys- 
function in  eidvance  would  have  obvious  ad- 
vantages for  the  planner.  This  is  the  ultimate 
goal  of  the  present  endeavor.  The  chapter 
"Unintended  effects  of  program  performance 
mecisurement  in  Pennsylvania"  (this  volume) 
applies  the  view  presented  here  to  a current 
use  of  performance  indicators  in  mental  health. 


Experiential  Evidence 

Examples  of  the  interaction  between  fallible 
measurement  and  the  requirements  of  bu- 
reaucratic recordkeeping  follow.  Since  it  is 
aissumed  that  the  problems  described  are  the 
rule  rather  than  the  exception  and  because  it 
is  unnecessary  to  single  out  any  particular 
agency  or  program  when  the  examples  could 
have  come  from  virtually  any,  source  material 
will  be  documented  only  for  examples  cir- 
culated publicly.  The  illustrations  included  are 
i Dt  chosen  because  of  their  unusual  nature  or 
£ ^verity.  On  the  contrary,  it  is  the  author's 
impression  that  they  are  typical,  an  impression 
verified  by  wide  discussion. 

Insurance  Practices 

Examples  of  provider  manipulation  to  fit  the 
client's  insurance  coverage  to  the  therapist's 
customary  fee  occur  in  numerous  public  and 
private  settings  (McGuire  and  Frisman  1983). 
Even  where  private  insurance  coverage  for 
outpatient  mental  health  care  is  officially 
limited  to  treatment  offered  by  physicians, 
other  mental  health  professionals  often  have 
been  able  to  obtain  insurance  reimbursement 
for  their  clients.  This  hzis  been  possible 
through  the  actual  provider  entering  into  some 
form  of  partnership,  supervisory  relationship, 
or  cotherapy  arrangement  with  someone  who 
is  reimbursement  eligible.  The  precise  ar- 
rangement depends  upon  the  State's  legal  re- 
quirements, the  client's  insurance,  and  the 
wishes  of  the  professionals  involved.  Similar 
arrangements  may  be  made  to  make  the 
services  provided  match  those  that  are  reim- 
bursed and  to  bill  in  categories  that  meiximize 
reimbursement.  Billing  may  be  done  for  a 


therapy  session  of  1 hour  when  only  %-hour  is 
provided,  or  several  members  of  a family  may 
be  seen  together,  while  each  is  billed 
separately. 

Administrative  actions  can  also  be  aimed  at 
reimbursement.  Medicaid  clients  who  au-e 
neither  dangerous  nor  endangered  may  be 
discharged  on  Friday  and  readmitted  on  Mon- 
day so  that  Medicaid  days  are  used  only  during 
times  when  there  is  full  programming,  rather 
than  on  weekends  when  care  is  largely  custo- 
dial and  the  privately  insured  are  at  home  "on 
pziss." 

Should  this  practice  be  examined  statis- 
tically, the  following  relationships  would  be 
found.  All  other  things  being  equal,  privately 
insured  clients'  length  of  hospitalization  would 
be  longer  than  that  of  Medicaid  clients.  The 
latter  group,  however,  would  have  more  mul- 
tiple admissions.  The  same  figures  would  show 
fewer  rezidmissions  among  individuals  with 
lengthier  hospitalizations.  One  might  easily 
misinterpret  such  findings  as  demonstrating 
the  efficacy  of  longer  hospitalization,  as  de- 
finitive of  differential  treatment  of  welfare 
clients,  or  perhaps,  as  indicative  of  greater 
liability  in  the  condition  of  clients  of  lower 
socioeconomic  status.  In  fact,  they  describe 
only  administrative  convenience  in  the  face  of 
regulatory  restraint  and  third-party  coverage. 

Statistical  Indicators  of  Program 
Effectiveness 

In  a State  where  statistics  regarding  re- 
hospitalization and  length  of  stay  by  diagnosis 
were  the  subject  of  much  attention,  the 
average  length  of  stay  by  diagnosis  was  com- 
puted for  the  State  as  a whole  and  separately 
for  each  State  institution.  These  comparisons 
were  distributed  along  with  advice  to  reduce 
length  of  stay  and  detailed  criteria  for  as- 
signment of  clients  to  diagnostic  categories. 
Revised  length  of  stay  by  diagnosis  findings 
were  distributed  at  regular  intervals,  even- 
tually accompanied  by  a directive  making 
average  length  of  stay  by  diagnostic  category 
a discharge  target. 

Yet  despite  the  detailed  diagnostic  criteria 
submitted  with  the  policy  statement,  enough 
latitude  remained  for  considerable  discretion 
in  their  use.  Clients  without  homes  or  lacking 
fronds  for  housing-  and  therefore  expected  to 
remain  hospitalized— were,  when  possible, 
diagnosed  as  psychotic  so  that  their  continued 
inpatient  residence  would  remain  lonchallenged 
pending  solution  of  their  practical  problems. 


68 


This  consideration  for  hard  to  place  clients 
aided  institutionzil  "progress"  in  that  it  re- 
duced the  reported  length  of  stay  for  some 
diagnostic  categories.  Although  earning  praise, 
the  practice  fed  into  subsequent  computation 
of  average  length  of  stay,  changing  it,  and 
producing  new  figures  to  which  the  facility 
was  later  expected  to  conform.  Thus,  distor- 
tions in  diagnosis  produced  by  early  responses 
to  the  State  length  of  stay  directive  in- 
fluenced future  behavior  at  both  the  State  and 
institution  level. 

Quotas  and  Target  Populations 

Concentration  upon  quotas  and  target  pop- 
ulations yields  similar  results.  Clinicians  who 
consult  with  schools  and  parents  give  evidence 
that  definitions  of  handicapping  conditions 
change  with  the  availability  of  special  pro- 
grams. Quotas  based  upon  severity  of  impair- 
ment are  common.  Many  State  facilities  are 
mandated  to  serve  the  "severely  and  moder- 
ately disturbed."  Mildly  disturbed  clients  and 
those  seeking  growth  are  to  be  referred  else- 
where. But  this  reeisonable  priority  creates 
problems.  Facilities  serving  the  poor  or  within 
the  economic  reach  of  poorly  insured  members 
of  the  middle  class  are,  if  present  at  all,  fre- 
quently less  accessible  than  State  facilities. 
Thus,  convenience  and  economy  make  com- 
munity mental  health  centers  and  State  out- 
patient facilities  attractive  to  those  who 
should,  according  to  regulations,  go  elsewhere. 
At  the  same  time,  staffing  and  budgeting 
patterns  for  the  outpatient  centers  depend 
upon  utilization  rates.  An  underutilized  serv- 
ice in  terms  of  client  numbers  and  service 
hours  will  suffer  staff  and  budget  cuts,  while 
an  overutilized  one  will  receive  increzises. 
Thus,  from  the  perspectives  of  both  service 
seeker  and  sevice  provider,  the  situation  may 
be  improved  by  bending  the  regulations  toward 
over- diagnosis.  This  temptation  is  enhanced  by 
the  fact  that  healthier  clients  are  usually 
more  satisfying  to  work  with.  A wise  admin- 
istrator who  wishes  to  keep  staff  members 
from  seeking  more  fulfilling  employment  will, 
therefore,  encourage  the  practice. 

Assessment  Instruments 

Many  mental  health  professionals  and  pro- 
gram evaluators  seek  objectivity  through 
formal  assessment  instruments.  One  such 
meeisure  is  the  Discharge  Readiness  Inventory 
that  has  been  used  to  eissess  the  causes  of 


multiple  psychiatric  2idmissions  as  well  as  to 
zissist  in  limiting  them.  In  one  setting,  all 
clients  in  a multiple  eidmission  category  were 
rated  by  trained  members  of  the  treatment 
team  as  well  as  being  the  subjects  of  special 
treatment  review  and  planning  sessions  at 
which  the  ratings  were  presented.  Decisions 
were  made  regarding  discharge,  followup,  and 
treatment  strategies  to  prevent  further  re- 
hospitalization. As  the  raters  were  part  of  the 
treatment  team,  team  bias  entered  the  rating 
process.  Clients  known  to  be  amenable  to 
outpatient  treatment  routinely  received  better 
scores  than  their  current  behavior  would  war- 
rant, while  those  known  to  be  less  diligent  in 
pursuing  outpatient  care  were  rated  less  pos- 
itively. Here  the  value  of  the  research  in- 
strument as  an  objective  assessment  of  the 
client  group  under  study  wzis  debased  by  its 
decisionmaking  function.  Yet,  here  too,  its 
value  was  limited.  More  often  than  it  was  used 
as  data  upon  which  to  base  a decision,  it  was 
used  as  justification  for  a decision  that  held 
already  been  made. 

Accreditation 

Changes  in  procedure,  particularly  charting, 
often  are  introduced  immediately  before  ac- 
creditation when  compliance  is  more  likely. 
For  charts  that  are  alrezidy  up-to-date,  con- 
version creates  extra  paperwork  and  occzisions 
much  ill  temper.  Less  conscientious  recorders 
fare  somewhat  better.  A therapist,  who  wzis 
behind  in  charting  during  a period  when  the 
institution  shifted  to  a more  goal  oriented 
scheme,  noted  with  plezisure  the  eeise  with 
which  charts  were  brought  up-to-date. 
Today's  achievements  could  be  recorded  as 
yesterday's  goals,  yesterday's  achievements  as 
fulfillment  of  goals  set  last  week,  zind  so  on. 


Legislation 

Mental  health  workers  seeking  to  cooperate 
with  a city  school  district  in  coordinating 
children's  services  were  told  that  in  order  to 
comply  with  the  freedom  of  information  re- 
quirement, the  school  system  kept  dual  rec- 
ords. One  set  of  records,  which  would  be  re- 
leeised  to  the  student  or  parent  in  compliance 
with  the  legislation,  contained  grades,  stand- 
ardized achievement  tests,  and  disciplinary 
notices  of  record.  The  second  set,  open  only  to 
school  personnel  and  select  outside  profes- 
sionals such  as  psychotherapists,  included 


69 


teachers'  ev«iluations  emd  other  nonpublic  in- 
formation including  psychological  test  profiles. 


Emerging  Patterns 

In  reviewing  the  literatwe  and  anecdotal 
material,  patterns  of  interaction  between  bu- 
reaucratic efforts  at  evaluation  and  control 
and  orgamizational  response  to  those  efforts 
seem  to  recur.  This  repetition  of  similar  pat- 
terns suggests  a lawfulness  which,  if  accu- 
rately described  and  C2u*efully  cinalyzed,  should 
be  applicable  to  prediction  of  future  responses 
to  regulatory  modification.  Such  a predictive 
schema  could  be  used  to  great  advantage  in 
decisionmaking,  since  alternative  courses  of 
action  could  be  weighed  not  only  in  terms  of 
the  problems  to  which  they  are  directly  ad- 
dressed, but  also  in  terms  of  the  side  effects 
they  are  likely  to  engender. 

Administrators  are  able  to  rely  upon  in- 
formal relationships  only  to  the  degree  that 
they  are  personally  involved  with  their  sub- 
ordinates. The  larger  the  organization, 
therefore,  the  more  they  mvist  rely  for  control 
on  the  tools  of  bureaucracy.  Herein  they  pre- 
dictably respond  with  further  reliance  on 
quantitative  indicators  in  their  efforts  to  re- 
gain control  (Berliner  1957;  Blau  1955,  1956; 
Covaleski  and  Dirsmith  1981a,  b;  Granick  1954; 
Prakhash  and  Rappaport  1977;  Seidman  and 
Couzens  1974).  As  Blau  (1955,  1956),  Berliner 
(1957),  and  Granick  (1954)  so  clearly  illustrate, 
the  ensuing  control  is  often  more  apparent 
than  reed  and  the  result  is  frequently  a dys- 
functional focus  upon  a single  aispect  of  the 
organization's  mission  (Pondy  1977). 

ProttJis  (1979)  and  Lipslgr  (1980)  note  that  in 
human  service  delivery  systems  "street-level 
bureaucrats" — the  professioneds  and  para- 

professionals  having  direct  client  contact- - 
perceive  managerial  attempts  at  control  as  an 
imposition  upon  their  autonomy.  Therefore, 
managerial  empheisis  on  control  frequently 
results  in  skewed  information  returning  to 
memagers,  rather  than  an  accurate  reflection 
of  client  need  or  staff  and  program  func- 
tioning. Clients  are  produced  to  fill  adminis- 
trative categories  (Bogdan  1976;  Prottas  1979) 
upon  which  policy  decisions  are  beised. 

When  a feedback  loop  returns  to  decision- 
makers just  information  that  hzis  been  dis- 
torted to  fit  their  own  categorical  biases  and 
numerical  goals,  the  beliefs  and  goals  of  those 
decisionmakers  remain  unedited  by  conditions 
of  the  street-level  world.  This  also  means  that 


the  likelihood  of  any  rule,  regulation,  or 
evaluation  procedure  being  withdrawn  once  it 
is  implemented  is  practically  nil  (Berliner 
1957;  Blau  1955,  1956;  Granick  1954,  1972; 
Ridgeway  1956). 

The  patterns  described  above  depend  upon 
assumptions  for  which  there  is  evidence  but 
not  proof.  The  first  assumption  is  that  of 
continuing  instability  in  human  service  pro- 
grams. It  is  also  aissumed  that  a minimax  con- 
trol strategy  will  be  pursued  by  each  individual 
and  each  functional  group  bcised  upon  interests 
that  only  sometimes  will  coincide.  Further, 
Blau's  (1955)  analysis  regairding  the  presence 
and  behavior  of  formal  and  informal  structures 
is  accepted.  Bureaucratic  tools  that  are  pre- 
dominantly quantitative  are  presumed  to  be 
zissociated  with  control  in  the  formal  struc- 
ture, while  qualitative  social  norms  are  re- 
garded as  the  controlling  force  in  the  informal 
one.  Thus,  even  when  the  interests  of  the  two 
coincide,  their  methods  are  likely  to  differ. 
This  difference  is  assumed  to  be  at  least  par- 
tially responsible  for  side  effects,  which  are 
characterized  more  by  self-perpetuating  pos- 
itive feedback  than  by  corrective  negative 
feedback  systems.  Although  side  effects  are 
assumed  to  be  necessary  concomitants  of  bu- 
reaucratic action,  the  question  of  whether 
those  that  are  dysfunctional  outwei^  those 
that  promote  societal  goals  remains  open. 

These  eissumptions  can  be  restated  as  a set 
of  related  lawful  regularities: 

Rl.  As  the  cohesiveness  of  an  organization 
declines,  its  control  via  group  norms 
also  decreases  and  is  replaced  by 
regulations. 

R2.  To  the  degree  that  rules  and  regula- 
tions replace  norms  as  control  mech- 
anisms, evaluation  procedures  become 
increasingly  rigid  and  quantitative. 

R3.  The  stricter  the  top-down  or  machine- 
model  control,  the  greater  the  inse- 
curity of  staff  members. 

R4.  Where  discretion  is  limited  in  one  re- 
spect it  will  be  used  in  another. 

There  are  two  corollaries: 

C4A.  When  it  is  not  against  the  rules  to  use 
discretion,  those  who  do  so  will  take 
responsibility  for  decisions  and  will  be 
held  accountable.  When  it  is  against 


70 


the  rules,  the  use  of  discretion  will 
continue  but  be  buried  in  the  bureau- 
cratic process— particularly  in  ex- 
cessive use  of  recordkeeping. 

C4B.  That  discretion  not  consumed  in  se- 
curing the  workers'  needs  will  be  used 
in  the  interests  of  clients. 

R5.  Where  professional  power  is  great  or 
where  regulation  takes  place  at  a 
distance,  professional  norms  are 
translated  into  bureaucratic  steindards. 

Campbell's  (1979)  relevant  "laws"  were 
stated  previously: 

R6.  "The  more  any  quantitative  social  in- 
dicator is  used  for  decisionmaking,  the 
more  subject  it  will  be  to  corruption 
pressures," 

R7.  "and  the  more  apt  it  will  be  to  distort 
and  corrupt  the  social  processes  it  is 
intended  to  monitor." 

The  effort  of  street-level  bureaucrats  to 
get  services  to  clients  includes  the  need  to 
meet  quantitative  stand2irds.  The  effort  of  ad- 
ministrators to  obtain  funding  and  to  demon- 
strate accountability  also  requires  meeting 
quantitative  standards  (Campbell  1969,  1971, 
1979;  Caplan  and  Nelson  1973;  Caro  1980; 
Deutscher  1977;  Deutscher  and  Gold  1979).  In 
the  biases  induced  by  need  to  conform  to  those 
stzindards,  the  meaning  of  the  standards 
themselves  changes.  This  change  is  not  uni- 
form from  time  to  time  or  place  to  place. 
Thus,  when  conformity  to  rigid  standards  is 
used  in  decisionmaking  from  year  to  yezir  and 
from  program  to  program,  the  usual  ceteris 
pairabus  assumptions  used  in  assessing  reli- 
ability and  validity  of  measurement  are  no 
longer  plausible.  Once  specific  scores  on  fad- 
lible  indicators  become  ends  in  themselves, 
those  scores  can  be  achieved  through  any  of 
the  indicators'  components— true  score,  error, 
and  any  number  of  bizises.  Despite  our  czire 
and  attention  to  the  psychometric  properties 
of  evaluation  instruments,  indicator  corruption 
is  the  inevitable  side  effect  of  decision 
pressure.  Our  less  than  cautious  use  of 
evaluation  tools  may  create  and  perpetuate 
systematic  biases  that  undermine  our  efforts 
to  improve  program  management.  It  does  not 
follow  that  we  should  abandon  the  evaluation 
enterprise.  Rather,  we  need  to  scrutinize  our 


own  behavior  judiciously  to  understand  its 
contribution  to  Izirger  values. 


References 

Beck,  B.  Cooking  the  welfare  stew.  In:  Haben- 
stein,  R.W.,  ed.  Pathways  to  Data:  Field 
Methods  for  Studying  Ongoing  Social  Or- 
ganizations. Chicago:  Aldine,  1970. 

Becker,  H.S.;  Geer,  B.;  and  Hughes,  EX.  Mak- 
ing the  Grade.  New  York:  Wiley,  1968. 

Becker,  H.S.;  Geer,  B.;  Hughes,  E.C.;  and 
Strauss,  A.L.  Boys  in  White:  Student  Culture 
in  Medical  School.  Chicago:  University  of 
Chicago  Press,  1961. 

Berliner,  J.S.  Factory  and  Manager  in  the 
USSR.  Cambridge,  Mziss.:  Harvard  Univer- 
sity Press,  1957. 

Blau,  P.  The  Dynamics  of  Bureaucracy.  Chi- 
cago: University  of  Chicago  Press,  1955. 

Blau,  P.  Bureaucracy  in  Modem  Society.  New 
York:  Random  House,  1956. 

Bogdan,  R.  National  policy  and  situated  mean- 
ing: The  case  of  head  start  and  the  handi- 
capped. American  Journal  of  Orthopsy- 
chiatry 46(2):229-235,  1976. 

Brown,  L.D.  Competition  and  health  cost 
containment:  Cautions  and  conjectures. 

Milbank  Memorial  Fund  Quarterly  59(2): 
145-189,  1981. 

Campbell,  D.T.  Reforms  as  experiments. 
American  Psychologist  24(4):409-429,  1969. 

Campbell,  D.T.  Methods  for  an  experimenting 
society.  Paper  presented  to  the  Eastern 
Psychological  Association,  September  5, 
1971,  Washington,  D.C. 

Campbell,  D.T.  Assessing  the  impact  of 
planned  social  change.  Evaluation  and  Pro- 
gram Planning  2:67-90,  1979. 

Caplzm,  N.,  and  Nelson,  S.D.  On  being  useful: 
TTie  nature  and  consequences  of  psycho- 
logical research  on  social  problems.  Ameri- 
can Psychologist  28(2):199-211,  1973.C2U*o, 
F.G.  Leverage  and  evaluation  effec- 
tiveness. Evaluation  and  Program  Planning 
3(2):83-89,  1980. 

Covaleski,  M.A.,  and  Dirsmith,  M.W.  Budget- 
ing in  the  nursing  services  area:  Its  manage- 
ment control,  political  and  witchcreift  uses. 
Health  Care  Management  Review  6(3):  17- 
25,  1981a. 

Covaleski,  M.A.,  and  Dirsmith,  M.W.  MBO  and 
goal  directedness  in  a hospital  context. 
Academy  of  Management  Review  6(3): 
409-418,  1981b. 


71 


Cronbach,  L.J.;  Ambron,  S.R.;  Dombuch,  S.M.; 
Hess,  R.D.;  Homik,  R.C.;  Phillips,  D.C.; 
Walker,  D.F,;  zind  Weiner,  S.S.  Toward  Re- 
form of  Program  Evaluation.  San  Francisco: 
Jossey-Bsiss,  1980. 

Cyert,  R.M.,  and  MacCrimmon,  K.R.  Organi- 
zations. In:  Lindzey,  G.,  and  Aronson,  E., 
eds.  Handbook  of  Social  Psychology.  2d  ed. 
Reziding,  Mziss.:  Addison-Wesley,  1968.  pp. 
568-611. 

Deutscher,  I.  Toward  Avoiding  the  Goal  Trap 
in  Evaluation  Research.  In:  Caro,  F.G.,  ed.. 
Readings  in  Evaluation  Research.  2d  ed. 
New  York:  Russell  Sage  Foundation,  1977. 
pp.  221-238. 

Deutscher,  I.,  juid  Gold,  M.  Traditions  and 
rules  zis  obstructions  to  useful  program 
evaluation.  Studies  in  Symbolic  Interaction 
2:107-140,  1979. 

Gardiner,  J.A.  Traffic  and  the  Police:  Varia- 
tions in  Law  Enforcement  Policy.  Cam- 
bridge, Mziss.:  Harvard  University,  1969. 

Ginsberg,  P.  "Predicting  the  Institutional  Im- 
pact of  Regulatory  Efforts:  Some  General 
Principles  and  Their  Relationship  to  Mental 
Health  Services."  Ph.D.  Dissertation,  De- 
partment of  Psychology,  Syracuse  Univer- 
sity, Ann  Arbor,  Michigan:  University  Mi- 
crofilms, 1982. 

Glass,  G.V.;  Tias,  G.C.;  and  McGuire,  T.O. 
Analysis  of  data  on  the  1900  revision  of 
German  divorce  laws  as  a quasi-experiment. 
Law  and  Society  Review  6:539-562,  1981. 

Gouldner,  A.W.  Patterns  of  Industrial  Bu- 
reaucracy. Glencoe,  111.:  The  Free  Press, 
1954. 

Granick,  D.  Management  of  the  Industrial 
Firm  in  the  U.S.S.R.  New  York:  Columbia 
University  Press,  1954. 

Granick,  D.  Managerial  Comparisons  in  Four 
Developed  Countries:  France,  Britain, 

United  States  and  Russia.  Cambridge,  Mziss.: 
M.I.T.  Press,  1972. 

Kutchinsky,  B.  The  effect  of  ezisy  availability 
of  pornography  on  the  incidence  of  sex 
crimes:  The  Danish  experience.  Journal  of 
Social  Issues  29(3):  163-181,  1973. 

Lipsky,  M.  Street-Level  Bureaucracy.  New 
York:  Russell  Sage  Foundation,  1980. 

McCleary,  R.  How  parole  officers  use  records. 
Social  Problems  24(5):576-589,  1977. 


McCleary,  R.;  Nienstedt,  B.C.;  and  Erven, 
J.M.  Uniform  crime  reports  as  organization 
outcomes:  Three  time  series  quasi-exper- 
iments. Social  Problems  29(4):361-372,  1982. 

McGuire,  T.G.,  and  Frisman,  L.K.  Reimburse- 
ment policy  and  cost-effective  mental 
health  care.  American  Psychologist  38(8): 
935-940,  1983. 

Office  of  Technology  Assessment,  Congress  of 
the  United  States.  The  Implications  of  Cost- 
Effectiveness  Analysis  of  Medical  Tech- 
nology. Background  papers  1 and  3,  1980. 

Pondy,  L.R.  Two  faces  of  evaluation.  In:  Mel- 
ton, H.W.,  and  Watson,  D.J.H.,  eds.  Inter- 
disciplinary Dimensions  of  Accounting  for 
Social  Go^s  and  Social  Organizations.  Ohio: 
Grid,  Inc.,  1977.  pp.  3-17. 

Prakhash,  P.,  and  Rappaport,  A.  Information 
inducteince  zmd  its  significance  for  ac- 
counting. Accounting,  Organizations  and 
Society  2(l):29-38,  1977. 

Pressman,  J.L.,  and  Wildavsky,  A.B.  Imple- 
mentation. Berkeley,  Calif.:  University  of 
California  Press,  1973. 

Prottas,  J.M.  People  Processing.  Lexington, 
Mziss.:  Lexington  Books,  1979. 

Ridgeway,  V.  The  dysfunctional  consequences 
of  performance  mezisurements.  Adminis- 
trative Science  Quarterly  1:240-247,  1956. 

Russell,  L.  Technology  in  hospitals:  Medical 
advances  and  their  diffusion.  Wzishington, 
D.C.:  Brookings  Institution,  1979. 

Seidman,  D.,  and  Couzens,  M.  Getting  the 
crime  rate  down:  Political  pressure  and 
crime  reporting.  Law  and  Society  Review 
8(3):457-493,  1974. 

Skolnick,  J.H.  Justice  Without  Trial:  Law 
Enforcement  in  Democratic  Society.  New 
York:  WUey,  1966. 

Stake,  R.E.  Testing  hazards  in  performance 
contracting.  Phi  Delta  Kappan  52(10): 
583-588,  1971. 

Taylor,  S.J.,  and  Bogdan,  M.  Defending  illu- 
sions: The  institution's  struggle  for  survival. 
Human  Organization  39(3):209-218,  1980. 

Zeisel,  H.  The  future  of  law  enforcement 
statistics:  A summary  view.  In:  Federal 
Statistics:  Report  of  the  President's  Com- 
mission. Vol.  II.  Weishington,  D.C.:  Supt.  of 
Docs.,  U.S.  Govt.  Print.  Off.,  1971.  pp. 
527-555. 


72 


Part  111 

Case  Studies  in  Performance  Measurement 


Introductory  Comments 

Understanding  comes  from  combining  ob- 
servation and  thought.  Observation  is  needed 
to  determine  the  facts;  thought  is  required  to 
make  sense  of  the  facts,  generalizing  beyond 
the  particular  observations  to  explanations 
that  may  involve  additional  hypothesized 
variables  and  laws.  Thought  is  so  creative  that 
it  easily  extends  beyond  observed  facts  and 
also  is  influenced  by  hopes  and  fears.  In  all, 
thought  frequently  controls  what  is  observed, 
blunting  the  opportunity  to  correct  mistaken 
views.  This  insulation  of  ideas  is  most  likely  in 
areeis  where  the  culture  has  established  widely 
accepted  stereotypes,  values,  or  explanatory 
myths.  The  topic  of  program  performance 
measurement  is  one  of  these,  in  that  it  is 
consistent  with  the  zissumption  that  it  is  im- 
portant to  apply  rational  and  scientific  meth- 
ods in  program  management.  This  widely  ac- 
cepted zissumption  means  that  if  our  views  are 
to  be  modified  by  additional  facts,  these  facts 
need  to  be  presented  in  a compelling  form. 

One  of  the  most  compelling  formats  is  the 
case  study.  Its  narrative  quality,  specificity, 
and  frequent  inclusion  of  personal  charac- 
teristics increase  readers*  interest  and  ease  of 
understanding  as  well  as  the  credibility  of  the 
message.  These  same  features  account  for  the 
power  of  anecdotes  in  communicating  to  man- 
agers (Mintzberg  1973).  Case  studies  present 
information  in  depth,  but  at  a cost  of  repre- 
sentativeness and  of  breadth.  Therefore,  a 
number  of  cases  are  required  to  reveal  both 
how  similar  processes  may  occur  under  diverse 
conditions  and  how  different  processes  may 
occur  in  what  appear  to  be  similar  conditions. 

Kennedy  (1979)  suggested  that  generaliza- 
tion from  case  studies  will  be  improved  if  the 
cases  include  a wide  range  of  important  sam- 
ple and  treatment  attributes,  have  many  im- 
portant attributes  in  common  with  the  popu- 
lation to  which  generalization  is  desired,  have 
few  unique  attributes,  and  have  common  out- 
comes and  common  functions  underlying  these 
outcomes  across  cjises.  Czise  studies  were 


selected  for  this  book  on  the  bzisis  of  two  main 
criteria— maturation  and  diversity. 


Maturation 

All  cases  chosen  were  required  to  have 
moved  beyond  the  concept  design,  pretest,  and 
field  research  stages  of  the  performance 
measurement  system  and  to  have  been  tried 
out  in  practical  operation.  When  systems  are 
newly  designed,  or  in  the  beginning  steps  of 
being  implemented,  the  hopes,  personal  in- 
vestment, and  advocacy  role  of  the  system 
designers  are  likely  to  make  descriptions  of 
the  systems  excessively  optimistic.  For  2 
consecutive  years  at  conferences,  developers 
of  a simplified  patient  data  form  for  a multi- 
component  community  mental  health  center 
described  the  wealth  of  information  about 
client  flow  among  components  that  would  be 
produced  once  the  system  became  operational. 
Subsequent  reports  on  the  results  achieved 
were  much  more  muted;  problems  had  blocked 
the  planned  scope  of  implementation.  Simi- 
larly, the  State  of  North  Carolina  was  de- 
scribed for  a 2- 3-year  period  as  being  on  the 
brink  of  initiating  a large-scale  mzinagement- 
oriented  information  system  for  mental  health 
services.  The  system  never  emerged  in  a fonn 
to  match  the  glory  of  the  preimplementation 
vision. 

The  truth  seems  to  be  that  preimplemen- 
tation descriptions  are  frequently  a part  of  the 
political  persuasion  process  aimed  at  imple- 
menting the  S5^tem.  While  some  descriptions 
may  be  prepared  simply  to  get  reactions  and 
suggestions  from  others,  others  may  be  de- 
signed to  convince  a local  funding  or  deci- 
sionmaking body  that  the  system  should  be 
supported.  Reactions  from  professional  peers 
may  be  used  to  attest  to  the  virtues  of  the 
proposed  system.  In  fact,  the  simple  accept- 
ance of  a paper  describing  the  system  for 
presentation  at  a professional  meeting  may  be 
all  the  evidence  of  the  virtue  of  the  system 
that  was  desired,  since  such  evidence  can  be 


73 


obtained  quickly.  By  the  time  of  the  actvial 
meeting,  some  3-6  months  after  the  proposal 
is  submitted,  the  crucial  decisions  may  alreaidy 
have  been  made.  Another  motive  for  preim- 
plementation presentations  is  to  increase 
commitment  to  and  acceptance  of  the  inevi- 
tability of  the  s5Tstem.  Evidence  that  a new 
system  is  actually  being  implemented  is  es- 
pecially important  when  it  is  to  be  imposed  on 
others  whose  cooperation  is  needed,  a condi- 
tion characteristic  of  performance  measure- 
ment systems. 

To  avoid  false  or  untested  presentations,  the 
cases  for  this  book  all  were  required  to  have 
experienced  efforts  at  implementation.  In 
fact,  most  of  the  ceises  suggest  that  a series  of 
implementations  occurs  as  systems  evolve. 
They  become,  in  effect,  multiple  cases,  as  the 
systems  themselves  change  in  design  eind  as 
the  political  aind  zidministrative  conditions 
that  surround  them  change. 


Diversity 

Cases  were  chosen  to  represent  a variety  of 
service  programs.  This  criterion  was  used  both 
to  examine  the  generality  of  conclusions 
across  types  of  services  and  administrative 
conditions  and  to  increase  interest  for  readers. 
This  goal  was  fairly  well  accomplished.  Cases 
are  included  for  housing,  law  enforcement, 
rehabilitation,  mental  health,  education,  and 
government  zidministration.  For  some  of  these, 
only  brief  abstracts  of  already  published  re- 
ports are  included,  but  those  summaries  report 
the  general  history  of  the  ceise,  and  interested 
reeiders  can  refer  back  to  the  original  reports 
for  additional  detail.  Performance  measure- 
ment systems  also  have  been  reported  for 
Federal  health  care  programs  (Bureau  of  Com- 
munity Health  Services  1978),  hospital  serv- 
ices (Griffith  1978;  Griffith  et  al.  1981),  and 
the  Federal  Government  itself  (USOPM  1981). 

Cases  also  have  been  obtained  from  pro- 
grams initiated  by  the  Federal  Government, 
and  by  States.  Just  as  the  Federal  Government 
took  a leadership  role  in  encouraging  many 
social  service  programs  in  the  1960s  and 
1970s,  so  it  also  played  a major  role  in  di- 
recting performance  measurement  systems  for 
these  programs.  States  and  local  agencies 
were  likely  to  participate  in  such  s3rstems 
more  as  data  providers  than  as  planners, 
analysts,  and  users  of  such  information.  As 
Hatry  reports  in  this  volume,  little  regular 
productivity  measurement  took  place  at  the 
State  and  local  government  level  in  the  late 


1970s.  As  leeidership  for  service  programs 
shifts  to  the  States  in  the  1980s  in  response  to 
the  Reagan  Administration's  elimination  of 
categorical  federal  grzint  programs.  States  and 
facilities  seem  to  be  taldng  a larger  role  in 
developing  performance  mezisurement  sys- 
tems. Thus,  the  1984  National  Conference  on 
Mental  Health  Statistics  featured  talks  by 
representatives  of  six  States  on  their  mental 
health  program  performance  measurement 
sjrstems.  Two  of  the  cases  in  this  volume 
(Kimmel's  and  Ginsberg's)  focus  on  State  level 
performance  meaisurement  systems. 

A criterion  that  was  not  used  for  case 
selection-  that  is,  inspiration  or  guidance  on 
how  others  should  either  institute  or  oppose 
performance  mezisurement— should  be  stated 
clearly  as  well.  Frequently,  case  studies  are 
presented  as  models  (e.g..  Alley  et  al.  1979), 
but  this  basis  was  not  used  here.  One  reason  is 
that  it  is  not  clear  that  there  is  yet  any  per- 
formance measurement  system  operating 
sufficiently  well  that  it  can  be  considered  a 
model.  Another  rezison  is  that  at  this  stage  of 
development,  it  may  be  more  important  to 
learn  about  likely  problems  than  to  hear  of 
instances  where  the  obstacles  were  so  slight 
that  success  could  be  attained. 

The  cases  that  follow  show  that  perform- 
ance measurement  is  still  in  a developmental 
stage.  More  experiences  of  failure  than  of 
success  are  seen.  To  put  the  same  judgment 
more  optimistically,  the  trial-and-error  proc- 
ess continues  to  sdeld  feedback  that  prompts 
further  modifications  in  performance  meas- 
urement systems.  If  feedback  about  past  ceises 
is  useful  for  the  design  of  future  performance 
measurement  systems,  the  cases  presented 
here  will  have  been  of  value. 


References 

Alley,  S.R.;  Blanton,  J.;  Feldman,  R.E.;  Hunt- 
er, G.D.;  and  Rolf  son,  M.  Case  Studies  of 
Mental  Health  Paraprofessionals:  Twelve 
Effective  Programs.  New  York:  Human 
Sciences,  1979. 

Bureau  of  Community  Health  Services.  In- 
struction Manual  for  the  BCHS  Common 
Reporting  Requirements.  Rockville,  Md.: 
U.S.  Department  of  Health,  Education,  and 
Welfare,  1978. 

Griffith,  J.R.  Measuring  Hospital  Perform- 
ance. Chicago,  111.:  Blue  Cross  Association, 
1978. 

Griffith,  J.R.;  Restuccia,  J.D.;  Tedescki,  P.J.; 
Wilson,  P.A.;  and  Zuckerman,  H.S.  Meas- 


74 


uring  community  hospital  services  in  Mich- 
igan. Health  Services  Research  16:135-160, 
1981. 

Kennedy,  M.M.  Generalizing  from  single  caise 
studies.  Evaluation  Quarterly  3:661-678, 
1979. 


Mintzberg,  H.  The  Nature  of  Managerial  Work. 

New  York:  Harper  and  Row,  1973. 

U.S.  Office  of  Personnel  Management.  Federal 
Productivity  Measurement.  Wzishington, 
D.C.:  U.S.  Govt.  Print.  Off.,  1981. 


75 


Social  Performance  in  an  Engineering  Framework 
The  LEAA  Experience 

Edwin  W.  Zedlewski 
National  Institute  of  Justice 


A common  approach  to  developing  per- 
formance measurement  systems  is  to  focas 
initially  on  program  goals  and  work  deduct- 
ively toward  specific  objectives,  relevant 
meeisures,  and  requisite  data  elements.  This 
approach,  which  hzis  been  used  successfully  in 
defense  systems  analysis  since  World  War  II, 
also  has  been  applied  in  such  areas  as  the 
space  program.  It  is  particularly  appropriate 
for  missions  where  activities  are  mechanical 
or  their  consequences  are  highly  predictable. 

Successful  transfer  of  this  model  to  complex 
social  service  delivery  functions  depends  upon 
the  validity  of  assumptions  so  fundamental 
that  they  are  often  forgotten  during  the  dif- 
ficult teisks  of  measurement  selection,  data 
specification,  and  system  implementation. 
What  results  is  a measurement  system  that 
produces  measures  but  without  clear  evidence 
that  the  meeisures  zissess  performance. 

The  measiarement  model  assumes  that  the 
goals  of  the  service  or  program  are  universally 
understood  and  agreed  upon- -that  program  or 
agency  performance  is  clearly  defined.  Cer- 
tainly that  assumption  holds  at  some  level  of 
abstraction  such  as  "improve  health"  or  "pre- 
vent crime."  When  goal  statements  are  made 
more  specific,  however,  they  tend  to  acquire 
values  peculiar  to  the  speaker  or  designer.  The 
development  of  a satisfactory  definition  of 
performance  may,  therefore,  pose  a severe 
obstacle  to  the  design  of  a measurement  sys- 
tem in  the  public  sector. 

A second  fundamental  assumption  is  that 
stated  goals  and  objectives  are  achievable. 
This  zissumption  may  be  well  grounded  when 
dealing  with  engineering  systems,  but  it  de- 
serves closer  scrutiny  in  the  context  of  social 
programs  where  outcomes  are  governed  by 
more  than  physical  laws.  Scientists  may  have 
winced  at  the  developmental  pace  implied  by 
President  Kennedy's  promise  to  land  a man  on 


the  moon  within  10  years,  but  at  least  they 
knew  that  the  laws  of  physics  supported  his 
proposition.  When  a parole  officer  sets  a goal 
of  reducing  recidivism  among  his  parolees  by  a 
modest  5 percent  in  the  next  year,  he  does  so 
with  far  less  conviction  that  some  fundamen- 
tal law  of  human  behavior  will  operate  in  his 
favor. 

Judgment  on  the  technical  feasibility,  or 
achievability,  of  goals  in  social  programs  is  an 
art  at  best  and  requires  substantive  knowledge 
within  the  program  area.  Rational  goal- setting 
is  complicated  not  only  by  these  technical 
limitations  but  also  by  the  political  environ- 
ment in  which  social  programs  operate.  Goals 
in  that  environment  are  often  established  by 
lay  people  to  enlist  support  for  programs  or 
agencies  or  to  inspire  staffs  to  work  more 
zealously.  Issues  of  feasibility  defer  to  issues 
of  program  survival.  Publicly  stated  goals, 
therefore,  tend  to  embody  exaggerated  and 
lofty,  if  not  completely  unrealistic,  expec- 
tations of  what  an  agency  or  program  will 
produce. 

Lofty  expectations  pose  no  serious  threat  to 
an  agency  or  program  if  they  remain  rhetor- 
ical. When  they  are  used  as  an  instrument  of 
accountability,  however,  the  agency  is  in 
serious  trouble.  This  chapter  sketches  some  of 
the  experiences  of  an  agency  that  labored  for 
10  years  under  these  difficulties.  Created  in 
1968,  the  Lav/  Enforcement  Assistance  Ad- 
ministration (LEAA)  was  charged  with  solving 
"the  crime  problem."  Consistent  with  its 
authority,  LEAA  provided  technical  and  fi- 
nancial assistance  to  State  and  local  criminal 
justice  agencies  through  a variety  of  mech- 
anisms. It  administered  over  the  1970s  a State 
block-grant  program  of  financial  zissistance 
averaging  $400  million  per  year.  It  sponsored 
research  in  crime  control  and  local  systems 
improvements  in  policing,  prosecution  and 


76 


defense,  court  processes,  and  corrections.  And 
it  fostered  the  development  of  State  and  local 
information  systems  as  well  as  national  stat- 
istical reports  on  victimizations  and  system 
expenditures. 

Not  unlike  the  space  program's  mission  to 
land  a man  on  the  moon,  LEAA's  mission  was 
largely  single-minded.  Its  charge  was  to  re- 
duce street  crime,  an  ambitious  mission  for  a 
Federal  money-dispenser.  Therefore,  Con- 
gress, in  judging  LEAA's  performance  used 
State  and  local  crime  rates  as  its  yardstick.  If 
crime  rates  fell,  LEA  A had  performed  well.  If 
they  rose,  LEAA  had  done  poorly.  Looking 
backward,  it  may  seem  surprising  that  admin- 
istrators of  the  agency  did  not  object  to  crime 
statistics  as  accountability  measures  and  offer 
indicators  of  program  developments  zind  ex- 
penditures instead.  Because  at  least  part  of 
the  reason  is  historical,  we  trace  in  this 
chapter  some  of  the  history  of  the  Federal 
entry  into  local  crime  control. 


How  Crime  Became  a Problem 

Crime  ranked  as  a secondary  problem  when 
compared  with  other  societal  issues  and 
priorities  of  the  early  1960s.  Then  came  the 
1964  elections.  Lyndon  Johnson  offered  a 
long-range  vision  of  a Great  Society.  Senator 
Barry  Goldwater  wanted  to  draw  the  public's 
attention  to  what  he  felt  was  an  eroding  moral 
climate  in  the  United  States,  with  recent  in- 
creases in  reported  crime  a tangible  demon- 
stration of  that  erosion.  During  an  address  in 
St.  Petersburg,  Florida,  he  demanded  to  know 
how  his  opponent  could  ignore  the  6,000  or  so 
crimes  committed  in  the  Iasi  24  hours  (N.Y. 
Times  Sept.  5,  1964). 

Despite  his  poor  showing  in  the  election, 
Barry  Goldwater  succeeded  in  defining  crime 
as  a problem  by  citing  statistics  in  the  Federal 
Bureau  of  Investigation's  (FBI's)  Uniform 
Crime  Reports  and  by  zissigning  accountability 
for  those  State  and  local  crime  rates  to  the 
Johnson  administration. 

Unable  to  brush  aside  the  public  concern 
raised  by  Goldwater's  calls  for  Federal  action, 
Johnson  nonetheless  responded  cautiously.  His 
Great  Society  program  envisioned  an  ambi- 
tious war  on  poverty  and  social  inequity.  A 
v/ar  on  crime  was  viewed  by  some  supporters 
as  potentially  repressive  and  racist.  Moreover, 
the  United  States  had  a long  history  of  locally 
determined  criminal  justice  policy,  and  Fed- 
eral initiatives  raised  the  specter  of  a national 
police  force  under  the  control  of  Wzishington. 


Finally,  there  was  little  information  upon 
which  to  beise  a plan  of  Federal  action.  Except 
for  the  FBI's  crime  reports,  there  had  never 
been  a federally  funded  program  for  statistics 
on  local  criminal  justice  operations. 

The  Johnson  administration's  response  to 
the  crime  problem  appears  to  have  been 
shaped  by  two  primary  factors:  an  apprecia- 
tion of  the  dearth  of  information  upon  which 
to  base  a program  and  a determination  to 
maintain  the  responsibility  for  crime  control 
at  State  and  local  levels.  The  President's 
March  1965  message  on  crime,  the  first  of  its 
kind,  created  a national  crime  commis- 
sion—the  President's  Commission  on  Law 
Enforcement  and  Administration  of  Jus- 
tice—and  sought  legislation  to  establish  a 
grant-in-aid  program  for  law  enforcement 
within  the  U.S.  Department  of  Justice, 
thereby  empowering  the  Attorney  General  to 
support  innovative  programs  or  pilot  projects 
for  possible  later  national  replication  (Jol^on 
1965). 

The  19  Crime  Commissioners  chosen  over 
that  summer  met  for  the  first  time  in  the 
White  House  in  September.  As  on  previous 
occasions.  President  Johnson  stressed  the 
connection  between  crime  and  the  Great  So- 
ciety programs  in  poverty,  disease,  illiteracy, 
discrimination,  and  unemployment.  With 
characteristic  optimism,  the  President 
charged  the  Commissioners  to  "give  us  the 
blueprints  that  we  need  to  banish  crime" 
(Johnson  1966,  pp.  982-983).  The  House  and 
Senate  acted  with  equal  dispatch  to  draft  the 
Law  Enforcement  Assistance  Act  of  1965 
(Public  Law  89-197),  creating  the  Office  of 
Law  Enforcement  Assistance  (OLEA)  and 
authorizing  its  modest  $7  million  budget.  The 
bill  was  so  uncontroversial  that  it  passed  the 
House  by  a 326-0  roll  call  vote  and  the  Senate 
by  a voice  vote  without  opposition. 

President  Johnson's  charge  to  the  Com- 
mission epitomized  the  optimism  of  the  era 
and  established  the  performance  expectations 
for  the  Federal  response  to  the  crime  problem. 
There  was  little  doubt  in  the  minds  of  officials 
of  the  period  that  a solution  existed.  Finding 
the  proper  mix  of  technology,  management, 
and  leadership  to  eliminate  crime  was  the 
mandate  given  to  the  Commission  and  to 
OLEA.  The  possibility  of  failure  was  barely 
considered. 

The  well-staffed  Commission's  final  report 
The  Challenge  of  Crime  in  a Free  Society, 
contained  more  than  200  recommendations  and 
W21S  accompanied  by  9 supporting  volumes  on 
topics  ranging  from  policing  to  drunkenness. 


77 


plus  a series  of  staff  studies  zmd  consultants' 
papers. 

The  report  addressed  every  aspect  of  the 
criminal  justice  system  and,  consistent  with 
the  Great  Society  tenets,  linked  crime  to  ed- 
ucation, housing,  and  income.  The  agenda 
produced,  however,  argued  more  for  system 
reform  than  for  crime  suppression.  Report 
recommendations  were  directed  toward  seven 
objectives:  reduction  of  criminal  opportu- 
nities, new  treatments  for  offenders,  elimi- 
nation of  injustices,  greater  personnel  exper- 
tise, research,  additional  resources,  emd  cit- 
izen involvement  (President's  Commission 
1967).  The  Commission  maintained  that  crime 
could  be  controlled. 

Meanwhile,  OLEA  had  set  up  its  machinery 
for  funding.  The  agency  emphasized  action 
projects  over  research  but  placed  high  value 
on  the  innovativeness  and  transferability  of 
ideas.  Applications  for  (1)  expanded  facilities 
or  resources  or  (2)  the  introduction  of  im- 
provements already  in  common  use  elsewhere 
were  discouraged.  State  and  local  systems  held 
few  innovative  ideas,  however,  and  proposals 
that  were  received  came  primarily  from  uni- 
versities rather  than  criminal  justice  agencies 
and  focused  more  on  training  than  on  action. 
Their  merits  in  advancing  professionalism  may 
have  been  clear,  but  their  contributions  to  the 
war  on  crime  were  oblique. 


The  Federal  Effort  Expands 

By  1966,  public  concern  over  crime  heid 
risen,  and  President  Johnson's  optimism  over  a 
speedy  solution  to  the  crime  problem  had 
fallen.  Moreover,  the  FBI's  crime  index  had 
risen  62  percent  since  1960.  In  publicly  rec- 
ognizing crime  as  a problem  in  its  own  right, 
not  just  as  axi  adjunct  of  poverty  and  racial 
injustice,  the  President  signaled  the  Admini- 
stration's acceptance  of  an  amplified  Federal 
role  in  crime  control. 

The  nature  of  the  Federal  role  was  spelled 
out  in  the  President's  1967  State  of  the  Union 
address.  Of  the  Commission's  seven  steps  to 
control  crime,  he  selected  financial  zissisteince 
as  the  most  appropriate  and  direct  mezms  of 
Federal  action.  The  substantive  details  of  the 
legislation  authorizing  the  establishment  of 
the  Law  Enforcement  Assistance  Admini- 
stration to  replace  OLEA  mirrored  the  Com- 
mission's recommendations.  A first-year 
authorization  of  $50  million  plus  a 
second-year  request  of  $300  million  would  be 
spent  on  grants  for  planning,  education. 


training,  information  systems,  research  and 
demonstration  programs,  and  operational  in- 
novations. The  structure,  strategy,  and  mag- 
nitude of  the  commitment  ensured  heated 
congressional  debate,  but  the  Omnibus  Crime 
Control  and  Safe  Streets  Act  (Public  Law 
90-351)  was  signed  into  law  in  October  1968. 

The  final  bill,  however,  added  a substantial 
block-grant  program  under  which  States  could 
qualify  for  Federal  crime  funds  by  submitting 
an  annual  crime  analysis  plan.  And  the  bill's 
tone  was  decidedly  different  from  the  Com- 
mission's report,  which  had  concluded  with  the 
caveat:  "Controlling  crime  in  America  is  an 
endeavor  that  will  be  slow  and  hard  and  cost- 
ly" (President's  Commission  1967,  p.  291).  The 
Congress  wanted  speedier  reduction  in  crime 
them  the  Commission  believed  possible  and 
allocated  only  $7  million  of  the  $300  million 
authorized  for  1970  for  reseeirch  purposes. 


LEAA  Launches  Its  Programs 

LEAA  organized  itself  in  accordance  with 
its  mandates.  Three  offices  were  established: 
one  to  review  State  plans  and  aidminister  the 
block-grant  program,  one  to  develop  and  pro- 
mote innovation  in  the  field  by  providing  funds 
for  technical  assistance,  and  one  to  support 
research  and  statistics.  In  spite  of  ongoing 
bureaucratic  friction  throughout  the  program, 
every  State  participated  in  block-grant  fund- 
ing, fostering  a "New  Federalism"  bureaucracy 
that  flourished  until  1979  when  a new  auth- 
orization bill  limited  the  use  of  LEAA's  mon- 
ies for  planning. 

Without  either  a research  bzise  or  sufficient 
funds  to  initiate  large-scale  field  experiments, 
the  research  group  opted  to  study  action  pro- 
grams via  program  evaluations.  The  primary 
source  of  program  concepts  was  the  Crime 
Commission's  report,  so  the  demonstration 
programs  more  resembled  loose  experiments 
on  sensible  ideas  than  demonstrations  of 
proven  techniques  derived  from  other  research 
projects.  Scores  of  these  programs  were  de- 
veloped by  LEAA  staff  during  the  1970s,  and 
State  and  local  agencies  were  invited  to  com- 
pete for  participation  through  a selection 
process  similar  to  those  for  other  Federal 
grants. 

LEAA  launched  its  first  major  initia- 
tive--the  Pilot  Cities  Program-  in  May  1970, 
reflecting  the  combination  of  planning  and 
local  action  envisioned  by  the  legislation. 
Pheiseout  occurred  in  1974  after  the  release  of 
a draft  of  a General  Accounting  Office  (GAO) 


78 


report  (1975)  concluding  that  the  program  had 
limited  benefits.  GAO's  criticisms  were  le- 
gitimate: Pilot  Cities  was  by  several  outcome 
criteria  unsuccessful.  Crime  rose  in  all  eight 
participant  cities  over  the  1970-74  period;  40 
percent  of  the  projects  undertaken  were 
judged  to  be  unsuccessful;  and  only  one 
cit3r— Norfolk— institutionalized  the  re- 

search, planning,  and  evaluation  capabilities  of 
the  staff  ed’ter  funds  ran  out  (Murray  and  Krug 
1975). 

Attribution  of  cause  and  effect  in  this  kind 
of  program  is  at  best  problematic,  but  the 
LEAA-funded  evaluation  cited  a number  of 
reasons  for  the  less-than-successful  outcomes: 
lack  of  technical  expertise  among  the  Pilot 
Cities  staffs,  resistance  to  the  program  at 
local  levels,  and  conflicts  over  the  goals  and 
objectives  of  the  initiative  (Murray  and  Krug 
1975).  One  source  of  conflict  was  the  dual 
goals  of  crime  reduction  and  systems  im- 
provement. As  crime  reduction  was  of  greater 
social  importance  than  systems  efficiency, 
projects  would  often  compete  for  funds  under 
a crime  reduction  argument  even  when  sys- 
tems improvement  was  a more  plausible  jus- 
tification. Another  source  of  conflict  was  the 
perceived  scope  of  benefits.  The  research 
teams  in  Pilot  Cities  were  concerned  with 
national  transferability  of  project  concepts, 
while  local  criminal  justice  practitioners  were 
content  with  local  impact.  Finally,  the  eval- 
uators found  need  of  a tradeoff  between  in- 
novation and  simple  improvement,  which  need 
not  be  innovative. 

By  1974,  however,  LEAA  was  well  into  its 
next  major  initiative— the  High  Impact  Anti- 
Crime  Program.  Both  the  title  and  the  timing 
suggest  that  LEAA  intended  to  continue  to  try 
for  short-term  crime  reduction.  The  program, 
launched  in  1972,  aimed  to  reduce  the  inci- 
dence of  five  specific  crimes  by  5 percent  in  2 
years  and  by  20  percent  in  5 years.  A second 
objective  was  to  improve  criminal  justice 
capabilities  via  the  demonstration  of  a com- 
prehensive crime-oriented  planning,  imple- 
mentation, and  evaluation  (COPIE)  cycle  in 
eight  American  cities. 

Like  Pilot  Cities,  High  Impact  relied  on  a 
strategy  of  locally  bzised  planning,  implemen- 
tation, and  evaluation.  Here,  however,  the 
commitment  to  evaluation  was  greater,  as  was 
the  commitment  of  funds.  One  possible  ex- 
planation of  the  Pilot  Cities  relative  failure 
was  that  $500,000  per  year  was  an  insufficient 
supplement  to  local  resources.  The  High  Im- 
pact cities  were  given  $20  million  each  for  the 
5-year  experiment  and  could  choose  their  own 


crime-control  projects,  provided  their  deci- 
sions were  derived  from  a planning,  anal3rsis, 
and  evaluation  process. 

Given  that  some  200  projects  were  operat- 
ing more  or  less  concurrently  in  the  eight  High 
Impact  cities,  attribution  to  individual  proj- 
ects of  gains  in  crime  control  was  impossible. 
As  for  the  cumulative  effect,  the  problem  was 
academic.  The  LEAA-funded  evaluation  of  the 
program  found  that  only  one  of  the  cities  ex- 
perienced a decline  in  crime  rates  during  the 
1972-74  period  (Chelimsky  1976).  Attempting 
to  introduce  alternative  indicators  of  program 
performance,  the  evaluation  noted  that  cities 
did  implement  sound  crime-oriented  plzmning 
using  the  sophisticated  COPIE  cycle.  Congress 
was  unimpressed,  however,  and  the  House 
denied  LEAA's  $262  million  request  for  a new 
5-year  anticrime  program.  The  agency's  1976 
reauthorization  (Public  Law  94-503)  never- 
theless retained  crime  control  as  LEAA's 
primary  mission. 

LEAA  did  sponsor  some  initiatives  directed 
at  s3TStems  change  and  improvement,  notably 
the  Standards  and  Goals  Program,  begun  in 
1974  and  aimed  at  improving  the  quality  of 
justice  and  the  system's  efficiency.  The  Na- 
tional Advisory  Commission  on  Criminal  Jus- 
tice Standards  and  Goals  had  been  commis- 
sioned in  1971  to  formulate  the  first  national 
standards  and  goals  for  systems  improvement 
at  the  State  and  local  levels.  The  final  reports 
somewhat  paralleled  the  Crime  Commission's 
effort  in  scope  zind  style.  An  essential  dif- 
ference was  that  200  broad  national  recom- 
mendations were  replaced  by  more  than  400 
specific  standards  or  reforms  organized  into 
goals  and  subgoals  (National  Advisory  Com- 
mission 1973).  Between  1974  and  1976,  States 
accepted  a total  of  $16  million  in  LEAA  grant 
funds  to  promote  and  institutionalize  the 
standards  at  State  and  local  levels. 

Few  standards  or  reforms  were  ever  adopt- 
ed, according  to  the  final  report  of  the  eval- 
uation. Even  fewer  could  be  attributed  to  the 
Standards  and  Goals  Program.  In  attempting  to 
bring  about  seemingly  reasonable  changes, 
LEAA  had  suffered  another  setback.  The 
evaluation  cited  scientific  uncertainty  of  the 
goals  desired,  cost  tradeoffs  between  goals 
and  current  practices,  and  differences  in  local 
values  as  reasons  for  program  failure  (Murray 
et  al.  1978). 


Evaluating  LEAA’s  Performance 
Few  of  LEAA's  scores  of  criminal  justice 


79 


initiatives  over  the  1970s  held  positive  evalu- 
ations. Yet  it  remains  lanclear  whether  LEAA's 
performance  W3is  good  or  poor. 

If  LEAA's  mission  is  defined  as  crime  con- 
trol and  crime  rates  are  chosen  as  the  per- 
formance measxare,  then  LEAA  performed 
poorly.  The  incidence  of  crime  in  America 
today  is  higher  than  before  LEAA  was  formed, 
and  few,  if  any,  of  the  programs  were  able  to 
demonstrate  effectiveness  in  reducing  crime. 
Such  an  cissessment  is  undoubtedly  too  harsh. 
LEAA  appears  to  have  acted  consistently  with 
the  causal  model  of  its  legislation;  it  encour- 
aged planning,  analysis,  and  coordination  in 
the  name  of  crime  control. 

If  a program  adheres  to  its  mission,  but  ex- 
pected benefits  do  not  occur,  one  must  even- 
tually question  the  validity  of  program  or 
mission  assumptions.  The  absence  of  physical 
laws  in  social  engineering  makes  flawless 
judgments  rare,  but  LEAA  appears  to  have 
experienced  what  Weiss  (1972)  called  causal 
theory  failure-  a misspecification  of  the 
hypothesized  linkages  between  program  ac- 
tivities or  interventions  and  expected  out- 
comes. Once  political  consensus  has  been 
reached  on  problem  definition,  moreover,  it 
changes  grudgingly.  LEAA  gradually  shifted 
from  crime  control  toward  a systems- 
improvement  orientation,  yet  Congress  re- 
tained its  crime-control  perspectives  of  LEAA 
for  10  years. 

The  trziditional  emphzisis  of  performance 
mecisurement,  whether  under  the  rubric  of 
evaluation  research  or  of  management 
science,  heis  been  accountability.  Goals  are 
identified,  either  by  an  evaluator  or  manage- 
ment, and  activities  or  programs  are  rated 
relative  to  these  goals  according  to  a set  of 
selected  measures.  The  result  is  a scorecard  of 
sorts  for  management  or  for  oversight  groups. 
It  is  doubtful,  given  what  little  knowledge 
about  crime  control  existed,  that  anyone  could 
have  constructed  a sensible  scorecard  for 
LEAA  in  1969.  Its  resources  consisted  of  $300 
million  in  grant  funds  and  a set  of  promising 
but  untested  concepts  compiled  by  the  Crime 
Commission.  Its  partners  were  State  and  local 
criminal  justice  agencies  that  had  had  only 
rare  contacts  with  experimentation  and  State 
program  administrators  who  were  about  to 
experience  something  called  New  Federalism. 
Accountability  in  this  setting  was  not  only 
inappropriate  but  counter-productive  as  well. 

Under  a different  set  of  stated  goals,  LEAA 
might  have  been  rated  more  positively. 
Chelimsky  (1977),  for  example,  argues  that  an 
important  function  of  Federal  programs  is  the 


acquisition  of  knowledge  about  social  pro- 
cesses, and  LEAA  increased  substantially  our 
working  knowledge  about  crime.  Alternatively, 
LEAA  as  a technical  assistance  provider  de- 
veloped effective  programs  for  improving 
mzinagement  practices  in  police  agencies, 
court  systems,  and  correctional  institutions. 

Alternative  definitions  of  performance  do 
exist  and  should  be  accounted  for  in  perform- 
ance measurement.  Most  groups  define  a pro- 
gram's goals  and  objectives  from  their  own 
perspectives  and  are  likely  to  omit  some  zis- 
pects  of  performance  valued  by  others.  More- 
over, the  relative  importance  of  various  goals 
changes  over  time.  Whitaker  et  al.  (1982)  refer 
to  the  preoccupation  with  obtaining  one-time 
closure  on  goals  and  objectives  as  "a  method 
gone  amuck."  Definition  ossifies  the  measure- 
ment system.  Whichever  aspects  of  perform- 
ance (typically  those  most  readily  measured) 
are  chosen  become  reified  as  performamce 
criteria  by  the  measurement  process.  Aspects 
of  performance  that  are  less  reaidily  defined, 
less  easily  meaisured,  or  just  plain  overlooked 
are  excluded  permanently  from  the  agency's 
performance  definition. 

In  addition,  performance  systems  measure 
achievement  but  not  achievability,  regairdless 
of  how  goals  and  objectives  are  specified. 
Setting  such  a goal  as  reducing  recidivism  is 
admirable,  but  quantifying  it  adds  remarkable 
authority.  Analysts  scoff  at  unmeaisurable  ob- 
jectives such  2is  "reduce  recidivism."  Yet, 
when  confronted  by  a measurable  statement 
like  "reduce  recidivism  by  5 percent  as  meas- 
ured by  X,"  they  rarely  ask  "Why  not  10 
percent?" 


Devising  Empirical  Measures 

Even  though  no  algorithms  to  measiore 
achievability  of  objectives  in  criminal  justice 
exist  at  this  time,  there  are  nonetheless  some 
sensible  wa3rs  to  approach  the  task.  Grizzle 
(1982),  for  instance,  suggests  that  measures  of 
correctional  workloads  and  outcomes  can  be 
compared  with  at  least  four  benchmarks  other 
than  stated  goals:  national  standards,  per- 
formance in  similar  correctional  settings,  own 
performance  over  time,  zind  predicted  per- 
formance as  derived  from  mathematical 
models  (simulations  and  cost  functions). 
Jacoby  (1982)  notes  that,  although  conviction 
statistics  in  prosecution  and  defense  are  sen- 
sitive to  variations  in  local  court  structures, 
indicators  of  average  workloads  are  reasonably 
stable  across  jurisdictions  and  time. 


80 


Goal  specification  and  achievability 
emerged  as  issues  from  recently  completed 
research  on  performance  measurement  spon- 
sored by  the  National  Institute  of  Justice  to 
stimulate  robust  conceptualizations  of  per- 
formance in  criminal  justice  agencies.  As  part 
of  their  efforts,  researchers  surveyed  the 
literatures  on  public-sector  performance 
mezisurement,  assessing  measurement  phi- 
losophies and  practices  (Grizzle  1982;  Jacoby 
1982;  Whitaker  1982).  They  also  identified 
topics  in  performance  measurement  that  ap- 
peeir  to  warrant  additional  empirical  research. 

The  first  of  these  themes,  defining  per- 
formance, concerns  the  development  of  a 
structure  for  performance  meeiswement  that 
accommodates  multiple  and  conflicting  per- 
formance definitions.  Connolly  and  Deutsch 
(1980)  use  the  term  S5rstem-relevzint  con- 
stituencies to  identify  collectively  those 
parties  with  an  interest  in  a given  agency's 
performance.  Naturally,  not  every  group  has 
the  same  interests.  The  commonality  among 
constituencies  is  that  each  has  performance 
expectations  and  is  seeking  information  rele- 
vant to  those  expectations.  When  agencies 
provide  relevant  information  to  constituen- 
cies, they  clarify  and  alter  performance 
expectations. 

Agency  performance  can  be  defined  as  the 
degree  of  congruence  between  the  services 
that  an  ageny  provides  and  the  services  its 
constituencies  demand.  Under  such  a defini- 
tion, agency  performance  will  tend  to  be  rated 
more  highly  among  homogeneous  communities 
than  among  heterogeneous  ones.  If  constit- 
uencies can  agree  on  the  kinds  and  amounts  of 
services  they  desire,  then  the  task  of  agency 
management  is  simplified.  On  the  other  hand, 
constituencies  that  hold  diverse  views  on 
agency  service  priorities  are  likely  to  be  dis- 
satisfied with  any  compromise  package  of 
services  proposed  by  agency  officials. 

The  agency  also  faces  a technology  man- 
agement problem.  Even  if  it  has  perfect 
knowledge  of  its  constituencies'  preferences, 
it  must  have  the  technical  expertise  to 
structure  its  resources  to  produce  a desirable 
combination  of  outputs.  The  concern  here  is 
whether  agency  management  understands  its 
service  delivery  technology  well  enough  to 
alter  its  mix  of  services  to  conform  to  con- 
stituency demands,  whereas  the  constituency 
expectations  issue  is  concerned  with  whether 
the  agency  can  identify  the  demands  and 
whether  it  tries  to  meet  them. 

The  third  problem  facing  the  agency  is  the 
social  outcome  problem— whether  the  agen- 


cy's products  contribute  to  socially  desirable 
outcomes.  In  the  field  of  offender  rehabili- 
tation, for  instance,  corrections  officials  can 
exert  considerable  control  over  the  counseling 
and  training  given  to  a prisoner.  They  can 
exert  only  the  most  limited  control,  however, 
over  the  social  and  economic  pressures  ex- 
offenders face  once  they  are  released,  and 
these  pressures  are  often  sufficient  to  over- 
come the  positive  benefits  of  correctional 
programs. 

Each  of  these  aspects  of  the  achievability 
problem  calls  for  a different  meaisurement 
perspective  and  strategy.  If  the  measurement 
s3Tstem  is  needed  for  the  accountability  or 
oversight  of  an  agency's  management,  it  must 
logically  focus  on  the  agency's  operations  and 
measure  services  produced,  how  they  are 
produced,  their  costs,  and  their  quality.  If,  on 
the  other  hand,  the  system  is  to  address  issues 
pertaining  to  mission  effectiveness,  its  meais- 
ures  must  extend  beyond  the  boundaries  of 
agency- controlled  functions  to  include  indi- 
cators of  social  outcomes.  Becaijse  these 
outcomes  are  determined  jointly  by  agency 
programs  and  societal  factors,  the  sjrstem 
should  include  indicators  of  intervening  vari- 
ables—employment,  health,  family  status, 
etc.- -to  account  for  competing  explanations  of 
program  successes  and  failures.  When  system 
needs  include  a determination  of  agency 
effectiveness,  the  scope  of  the  measurement 
problem  expands  from  a public  administration 
or  management  science  issue  to  a more  open- 
ended  social  science  inquiry. 

Reorienting  Measurement  Processes 

This  conceptual  framework  for  performance 
mezisurement  offers  no  immediate  solutions  to 
the  scores  of  measurement  and  research  issues 
that  remain  in  the  field,  but  it  does  offer  some 
promise  in  problem  definition  and  philosoph- 
ical reorientation.  Like  evaluability  assess- 
ment (Wholey  1977),  it  acknowledges  different 
kinds  of  failure- -inadequate  resources,  man- 
agerial and  political  problems,  causal  theory, 
etc.  Like  multi-attribute  utility  theory  (Ed- 
wards et  al.  1975),  it  recognizes  differences 
among  constituency  preferences.  These  sim- 
ilarities are  more  technical  than  philosophical, 
however. 

Instead  of  adopting  an  accountability  per- 
spective to  performance  meaisurement,  one 
can  admit  that 

• outcomes  of  social  programs  are  not  en- 
tirely predictable  and 


81 


• outcome  statistics  should  be  studied  as 
much  with  curiosity  as  with  oversight 

The  value  of  institutionalized  performance 
measurement  as  part  of  a learning  process  has 
been  largely  overlooked.  In  LEAA’s  early 
years,  a system  of  performance  meaisures 
oriented  toward  self-examination  could  have 
helped  establish  badly  needed  norms  for  pro- 
gram outcomes  and  pare  away  fruitless  lines 
of  investigation. 

Learning-oriented  measurement  processes 
benefit  mature  agencies  zis  well,  because 
mezisures  provide  objective  feedback.  Con- 
tinual measurements  of  routine  conditions 
establish  performance  norms  for  workloads 
and  services  delivered.  They  also  provide 
predictive  data  for  assessing  the  impacts  on 
objectives  caused  by  nonroutine  changes  such 
as  reorganizations  or  severe  budget  cuts.  Fi- 
nally, they  serve  to  validate  the  measurement 
process  itself.  Some  measures  remain  rela- 
tively constant  despite  repeated  attempts  by 
management  to  change  them.  Eventually, 
either  the  mezisure's  validity  or  the  mezisure- 
ment  object's  susceptibility  to  management 
control  must  be  examined. 

When  performance  meeisures  are  used  to 
stimulate  questions  rather  than  fend  off  ac- 
cusations, constructive  changes  in  manage- 
ment practices  and  in  the  measurement  sys- 
tem itself  are  encouraged.  Accountability  is 
no  longer  a question  of  whether  certain  norms 
or  quotas  were  met  but  whether  a manage- 
ment has  been  able  to  learn  and  improve  on 
the  beisis  of  its  own  statistics.  New  account- 
ability criteria  emerge.  Did  the  agency  iden- 
tify and  discontinue  services  having  little 
impact  on  its  objectives?  Did  it  introduce 
changes  to  remedy  service  deficiencies?  Was 
it  able  to  reduce  service  costs  without  adverse 
effects  on  consumers?  Questions  like  these 
transcend  static  goals  statements  and  orient 
information  toward  the  more  bzisic  and  general 
accountability  questions  in  the  Government: 
Did  the  agency  use  its  funds  rationally  and 
responsibly  for  the  general  public  interest? 

If  one  were  to  choose  a single  rezison  for 
LEAA's  poor  performance  rating,  it  might  be 
the  agency's  failure  to  demonstrate  the  ra- 
tionality of  its  activities  despite  their  insig- 
nificant impacts  on  crime  rates.  Ignoring  the 
Crime  Commission's  admonitions  of  a long  and 
costly  war,  the  Congress  had  pressured  LEAA 
for  a quick  return  on  its  investment.  LEAA 
tried  to  produce  rapid  progress  and  failed.  It 
subsequently  failed  to  shed  its  crimefighter 
image,  maintaining  it  for  a decade  despite  the 


fact  that  the  Federal  dollar  contribution  to 
crime  control  was,  even  at  the  height  of 
LEAA's  funding,  a small  fraction  of  the  total 
national  expenditure. 

LEAA  made  no  systematic  attempt  to  doc- 
ument its  contributions  to  improving  systems 
efficiency  or  the  administration  of  justice.  It 
could  document  where  and  how  funds  haid  been 
spent,  but  it  did  not  convincingly  link  these 
expenditures  to  a performzince  model  that 
contained  systems-improvement  objectives. 
Considering  the  crudeness  of  the  crime- 
control  definition  of  performance,  an  alter- 
native definition  need  not  have  been  very 
sophisticated;  simple  counts  of  State  and  local 
changes  plausibly  attributable  to  LEAA  funds 
might  have  been  adequate.  Having  no  adter- 
native  interpretation  of  LEAA's  value.  Con- 
gress logically  persisted  in  using  the  only  one 
available. 


Conclusion 

The  National  Aeronautics  and  Space  Admin- 
istration closed  its  first  deceide  by  successfully 
landing  a man  on  the  moon  in  July  1969— an 
impressive  victory  for  American  engineering 
and  for  systems  analysis.  An  unfortunate  side- 
effect  of  the  space  program,  however,  was  the 
superimposition  of  its  planning  and  evaluation 
technology  upon  a major  social  problem  that, 
in  retrospect,  had  none  of  the  characteristics 
necessary  for  a successful  application  of  an 
engineering  methodology. 

LEAA's  first  decade  of  activity  was  marked 
by  continual  frustration  for  both  the  Congress 
and  the  agency.  No  matter  what  strategy  the 
agency  tried,  it  did  not  seem  to  affect  the 
crime  rate.  This  Federal  drama  was  reenacted 
at  State  and  local  levels  over  the  same  period. 
Gradually,  however,  alternative  justifications 
for  programs  emergecL  higher  client  and  staff 
satisfaction,  improved  management  capability, 
and  increased  fairness  in  the  administration  of 
justice.  These  alternative  justifications  were 
no  more  meas\irable  than  crime  rates  but  were 
more  feaisible  goals  and,  therefore,  more 
useful  indicators  of  performance. 

Perhaps  the  most  significant  indicator  of 
the  learning  that  had  occurred  is  the  re- 
statement of  LEAA's  mission  in  its  most  re- 
cent reauthorization— the  Justice  Systems 
Improvement  Act  of  1979.  In  contrast  to  the 
immediate-results  tone  of  eairlier  legislation, 
"Congress  further  finds  that  there  is  an  urgent 
need  to  encourage  baisic  and  applied  research, 
to  gather  2ind  disseminate  accurate  and  com- 


82 


prehensive  justice  statistics,  and  to  evaluate 
methods  of  preventing  and  reducing  crime" 
(Public  Law  96-157).  The  Act  then  spells  out 
the  kinds  of  activities  that  LEA  A is  authorized 
to  conduct  and  outlines  the  kinds  of  informa- 
tion that  LEAA  must  provide  to  the  Congress 
so  that  Congress  can  fulfill  its  oversight 
responsibilities. 

The  Congress  has  since  received  a veiriety  of 
quantitative  and  qualitative  indicators  by 
\^ch  LEAA's  performance  can  be  eissessed. 
The  information  still  may  be  incomplete  and 
some  of  it  is  likely  to  be  irrelevant.  Different 
evaluations  will  be  made  by  the  users  of  the 
information,  depending  upon  their  values  and 
perceptions  of  national  priorities.  These  out- 
comes are  to  be  expected  in  the  measurement 
of  the  performance  of  social  programs.  We  can 
hope,  however,  that  the  second  decade  of 
Federal  effort  in  criminal  justice  will  see  the 
Congress  and  the  agency  adjust  priorities  and 
practices  on  the  beisis  of  more  relevant 
information. 


References 


ChelimslQr,  E.  High  Impact  Anti-Crime  Pro- 
gram, National  Level  Evaluation  Final  Re- 
port: Vol.  I.  Washington,  D.C.:  National  In- 
stitute of  Law  Enforcement  and  Criminal 
Justice,  Law  Enforcement  Assistance  Ad- 
ministration, U.S.  Department  of  Justice, 
1976. 

Chelimsky,  E.  An  Analysis  of  the  Proceedings 
of  a Symposium  on  the  Use  of  Evaluation  by 
Federal  Agencies.  M77-39.  Washington, 
D.C.:  The  Mitre  Corporation,  1977. 

Connolly,  T.,  and  Deutsch,  S.  Performance 
mecisurement:  Some  conceptual  issues. 

Evaluation  and  Program  Planning  3(1),  1980. 

Edwards,  W.;  Guttentag,  M.;  and  Snapper,  K,  A 
decision- theoretic  approach  to  evaluation 
research.  In:  Struening,  E.L.,  and  Guttentag, 
M.,  eds.  Handbook  of  Evaluation  Research. 
Vol.  1.  Beverly  Hills,  Calif.:  Sage,  1975.  pp. 
139-182. 

Grizzle,  G.  Basic  Issues  in  Corrections  Per- 
formance. Washington,  D.C.:  National  In- 


stitute of  Justice,  U.S.  Department  of  Jus- 
tice, 1982. 

Jacoby,  J.  Basic  Issues  in  Prosecution  and 
Public  Defender  Performance.  Wzishington, 
D.C.:  National  Institute  of  Justice,  U.S. 
Department  of  Justice,  1982. 

Johnson,  L.  "Crime,  Its  Prevalence,  and  Mezis- 
ures  of  Prevention,"  In:  H.R.  Document  No. 
103.  Weishington,  D.C.:  89th  Congress,  1st 
Session,  1965. 

Johnson,  L.  Public  Papers  of  the  Presidents  of 
the  United  States:  1965-2.  Washington, 

D.C.:  Supt.  of  Docs.,  U.S.  Govt.  Print.  Off., 
1966. 

Murray,  C.,  and  Krug,  R.  The  National  Evalu- 
ation of  the  Pilot  Cities  Program:  A Team 
Approach  to  Improving  Local  Criminal  Jus- 
tice Systems.  Washington,  D.C.:  National 
Institute  of  Law  Enforcement  and  Criminal 
Justice,  Law  Enforcement  Assistance  Ad- 
ministration, U.S.  Department  of  Justice, 
1975. 

Murray,  C.;  Bourque,  B.;  Heinsohn,  I.;  et  al. 
The  National  Evaluation  of  the  Staruiards 
and  Goals  Project.  Vol.  1.  Washington,  D.C.: 
American  Institutes  for  Research,  1978. 

National  Advisory  Commission  on  Criminal 
Justice  Standards  and  Goals.  A National 
Strategy  to  Reduce  Crime.  Washington, 
D.C.:  Supt.  of  Docs.,  U.S.  Govt.  Print.  Off., 
1973. 

President's  Commission  on  Law  Enforcement 
and  the  Administration  of  Justice.  The 
Challenge  of  Crime  in  a Free  Society. 
Washington,  D.C.:  Supt.  of  Docs.,  U.S.  Govt. 
Print.  Off.,  1967. 

U.S.  General  Accounting  Office.  The  Pilot 
Cities  Program:  Phaseout  Needed  Due  to 
Limited  National  Benefits.  GCD-75-16. 
Washington,  D.C.:  GAO,  1975. 

Weiss,  C.  Evaluation  Research.  Englewood 
Cliffs,  N.J.:  Prentice-Hall,  1972. 

Whitaker,  G.;  Mastrofski,  S.;  Ostrom,  E.;  et  al. 
Basic  Issues  in  Police  Agency  Performance. 
Washington,  D.C.:  National  Institute  of 

Justice,  U.S.  Department  of  Justice,  1982. 

Wholey,  J.  Evaluability  Assessment.  In:  Rut- 
mzm,  L.,  ed.  Evaluation  Research:  A Basic 
Guide.  Beverly  Hills,  Calif.:  Sage,  1977.  pp. 
41-56. 


83 


Performance  Indicators  in  Housing 


Vicki  Elmer 

U.S.  Depzirtment  of  Housing  and  Urban  Development— Region  DC 
San  Francisco,  California* 


Because  U.S.  Government-supported  housing 
services  are  intended  both  to  increase  the  re- 
cipient's ability  to  pay  for  adequate  housing 
and  to  improve  the  quality  of  the  unit  and  the 
neighborhood,  hoasing  can  be  regarded  as  a 
social  service  delivery  program  eis  well  as  a 
commodity.  The  dual  nature  of  housing  serv- 
ices makes  a discussion  of  performance  meas- 
ures to  evaluate  such  services  doubly  complex. 
It  also  makes  many  of  the  issues,  concerns, 
and  experiences  with  performance  indicators 
in  the  traditional  social  service  programs 
equally  relevant  in  the  field  of  housing. 

Expectations  about  what  Federal  housing 
programs  should  achieve  have  changed  radi- 
cally since  the  first  Housing  Act  of  1937. 
Three  different  approaches  to  performance 
indicators  have  been  used  over  the  paist  dec- 
ades, corresponding  to  three  major  strategies 
for  providing  housing  services  used  by  the  U.S. 
Department  of  Housing  and  Urban  Develop- 
ment (HUD)  and  the  Public  Housing  Adminis- 
tration (PHA). 

Efficiency  Indicators-  Housing  indicators 
were  first  implemented  as  a market  approach 
to  mzinagement  characterized  by  the  use  of 
efficiency  meeisures.  This  was  employed  during 
the  period  when  FHA  mortgage  insurance 
constituted  the  major  Government  housing 
program.  Data  collection  mechanisms  and  the 
management  ssrstems  that  used  the  indicators 
were  simple  and  straightforward,  and  good  and 
bad  performance  was  ezisily  identified. 

Program  Process  Indicators.— -Tho  second 
approach  to  performance  indicators  was  de- 
veloped during  the  late  1960s  when  Federal 
housing  programs  were  judged  on  their  pro- 
gram process,  as  such  assistance  was  intended 


primarily  to  supplement  the  market  process. 
Although  indicators  in  this  dimension  were 
more  difficult  to  develop  and  to  measure  than 
the  earlier  efficiency  gages,  an  effective 
process  indicator  system  was  eventvially  in- 
tegrated into  HUD'S  internal  management 
process,  contributing  to  record  levels  of 
housing  production  for  low  and  moderate  in- 
come people. 

Outcome  Indicators.— The  1970s  and  1980s 
have  seen  increaising  concern  about  the  effect 
Federal  housing  programs  have  on  neighbor- 
hoods and  on  serving  people.  Because  much 
confusion  exists  about  \^at  can  fairly  be  ex- 
pected of  assisted  housing  programs,  dimen- 
sions along  which  to  evaluate  them  are  still 
being  developed.  Indicators  of  all  sorts  abound, 
but  data  systems  to  support  them  are  in  dis- 
array. When  the  appropriate  developmental 
work  is  done,  outcome  indicators  can  provide 
effective  tools  to  help  in  the  internal  manage- 
ment of  the  HUD  bureaucracy,  zs  well  as  to 
monitor  local  impact. 

Efficiency  Indicators  and  the  Federal 
Housing  Administration 

Thompson  (1967)  hzis  argued  that  organiza- 
tions that  focus  upon  planning  or  controlling 
production  activities  generally  adopt  a ra- 
tional model  of  thinking  to  reduce  uncertainty 
and  aim  at  a criterion  of  maximum  efficiency, 
that  is  zissessing  whether 

• a given  effect  was  produced  with  the 
leaist  cost  or 

• a given  amount  of  resources  produced  the 
most  effect 


•Currently  Assist2uit  City  Mzinager  for  Planning 
auid  Community  Development,  Berkeley,  California. 


84 


He  contrzisted  this  rational  model  with  a 


natural-system  model  that  aims  at  the 
survival  of  a s3rstem  through  homeosteisis 
governing  necessary  relationships  between 
parts.  However,  he  pointed  out  that  efficiency 
measures  are  feeisible  only  when  organizations 
are  unambiguous  about  what  outcomes  they 
desire  cind  there  is  complete  knowledge  of 
cause-effect  relationships.  In  this  situation, 
performance  indicators  can  objectively  rate 
one  orgzinization  or  manager  against  a con- 
crete standard  that  compares  the  quantity  and 
timeliness  of  the  output  with  its  cost.  Effi- 
ciency indicators  are  commonly  used  by  many 
business  organizations,  as  they  were  in  the 
management  of  P'HA  mortgage  insurance 
programs. 

The  public's  expectations  about  Govern- 
ment-provided mortgage  insurance  were  fairly 
clear  when  the  program  began-  -to  ensure 
homeowner  mortgages  when  the  private  mar- 
ket did  not  dare  to  take  the  risk.  The  program 
v/as  adopted  in  1937— not  to  solve  poverty  or 
to  redistribute  income,  but  as  a demonstration 
of  the  classic  justification  of  Government  in- 
tervention according  to  welfare  economics: 
market  failure.  The  private  market  in  the  mid 
thirties  was  hesitant  to  make  long-term  loans 
on  homes.  Hence,  the  new  construction  sector 
showed  little  to  no  activity,  while  the  resale 
market  for  single  family  housing  also  had 
dropped  diszistrously  as  a result  of  the  De- 
pression. The  theory  behind  the  FHA  mortgage 
insurance  program  was  that  if  Government 
guaranteed  the  loan,  private  lenders  would 
have  more  confidence  in  the  future,  and  the 
market  would  be  induced  to  provide  adequate 
financing  to  meet  consumer  needs  by  a mini- 
mum of  Government  intervention. 

Dimensions  of  Performance 

Fiscal  Solvency— The  insurance  fund’s  fiscal 
solvency  ranked  as  the  first  important 
dimension  of  performance.  Loss  ratios,  default 
rates,  and  other  cost  figures  about  the  in- 
surance funds  themselves  were  used  for  long- 
term assessments  of  the  program  design--  that 
is,  the  risk  that  FHA  was  willing  to  take  in 
underwriting  the  lozins  and  the  cost  of  the 
service  to  the  public.  These  measures  were 
straightforward  in  concept,  although  they  of- 
ten required  considerable  methodological 
sophistication  to  calculate.  The  indicators 
were  derived  from  documents  used  internally 
to  process  issuance  applications  and  were 
prepared  only  once  on  a national  bzisis  to  judge 
program  performance.  Although  these  indi- 
cators might  have  been  useful  for  zissessing 


the  performance  of  local  HUD  offices  in  im- 
plementing the  program,  they  never  have  been 
prepared  on  a disaggregated  basis,  possibly 
because  other  management  devices  such  as 
onsite  file  reviews  also  provide  information  as 
to  whether  correct  underwriting  decisions  had 
been  made. 

Processing  Time. — The  speed  with  which  the 
mortgage  insurance  applicant  was  served  W2is 
considered  important  for  field  office 
monitoring  purposes.  The  indicators  used  to 
portray  this  dimension  were  the  percentage  of 
applications  processed  within  5 days  for  the 
first  step  in  the  process  and  within  3 days  for 
the  second  step,  with  95  percent  of  all  cases 
to  be  processed  within  these  time  frames.  This 
information  was  provided  on  a weekly,  manual 
report  from  each  field  office  and  was  used  as 
part  of  an  operational  management  system 
within  FHA  that  included  weekly  reports  to 
the  FHA  Commissioner.  This  procedure 
seemed  to  work  fairly  well  in  holding  the 
managers  accountable  both  within  the  agency 
and  to  the  general  public  for  day-  to-day  per- 
formance, according  to  industry  sources  and 
FHA  managers. 

Perverse  Consequences  of  the  Efficiency 
Approach 

As  long  as  the  public  continued  to  regard 
FHA  as  a private  business  run  by  the  Govern- 
ment, these  efficiency  measures  resulted  in  a 
tight  management  system  and  an  agency  that 
prided  itself  on  the  quality  of  its  work  and  its 
contribution  to  building  the  suburbs  of  Ameri- 
ca. As  attitudes  began  to  change  about  what 
the  public  wanted  from  a Government  housing 
agency,  however,  certain  perverse  conse- 
quences of  the  mortgage  insurance  program 
and  its  efficiency  approach  to  management 
began  to  be  seen. 

First,  only  income- eligible  persons  were 
permitted  to  receive  insurance.  This  meant 
that  poor  people  were  not  able  to  benefit  from 
the  program,  whose  benefits  were  indeed 
considerable.  It  is  generally  acknowledged  that 
the  FHA  mortgage  insurance  program  was 
responsible  for  raising  the  proportion  of 
homeownership  in  the  United  States  from  35 
percent  in  1929  to  over  65  percent  at  present. 
Estimates  of  the  mean  yearly  value  of  home- 
ownership  (as  compared  with  renting)  to  the 
average  family  range  from  around  $1,000  to 
$2,500,  depending  on  the  income  tax  bracket 
of  the  family  (Aaron  1972).  FHA  practices 
actively  discriminated  against  minorities  in 


85 


dispersing  the  benefits  of  the  mortgage  in- 
surance program,  however  (Abrams  1965).  In 
zuidition,  because  FHA  was  required  to  keep 
the  insurance  funds  fiscally  solvent,  the  or- 
ganization would  deliberately  redline  the 
poorer  areas  of  the  cities.  Although  consistent 
with  the  original  program  expectations,  this 
practice  meant  that  FHA  served  mainly 
middle- class,  white  suburbanites. 

Although  the  efficiency  indicator  mezisures 
adopted  by  the  decisionmakers  reflected  the 
early  goals  of  the  FHA  mortgage  insurance 
program,  as  the  post  World  War  II  building 
boom  slowed  down  and  the  United  States 
began  to  experience  increasing  affluence,  at- 
titudes began  to  change  about  the  early  FHA 
goals.  Accordingly,  new  programs  were  pro- 
posed to  overcome  some  of  the  limitations. 
These  new  programs  were  part  of  the  Great 
Society  effort  in  the  midsixties,  and  they 
carried  with  them  a need  for  an  entirely  dif- 
ferent indicator  system. 

Process  Indicators  in  Subsidized 
Housing  Programs 

The  subsidized  housing  programs  passed  in 
1968  were  inspired  by  the  leading  intellectuals 
of  the  day  to  open  up  Government  and  market 
processes  to  the  disadvantaged.  Behind  this 
concept  wzis  a commitment  to  the  liberal 
pluralist  tradition  in  America  and  a belief 
that,  if  given  financial  resources  and  technical 
assistance,  poor  people  could  compete  suc- 
cessfully in  the  American  Government  market 
system  to  obtain  those  things  that  middle  class 
and  the  rich  enjoyed  (Lowi  1968). 

A major  part  of  the  belief  system  prevalent 
during  the  sixties  was  an  unwillingness  to  im- 
pose middle-class  values  upon  poor  people. 
The  liberal  intellectual  community  felt  that 
such  an  imposition  would  deny  the  multiplicity 
of  values  that  makes  up  the  strength  of  the 
American  political  process.  This  intellectual 
theory  found  positive  expression  in  the  ad- 
vocacy planning  movement  that  had  many  ad- 
herents toward  the  end  of  that  decade 
(Davidoff  1965).  The  movement  sent  trained 
professionals  to  work  in  the  ghettos  to  help 
residents  articulate  their  desires  for  the 
neighborhood  and  to  help  residents  articulate 
their  desires  for  the  neighborhood  and  to  help 
them  fight  city  hall  and  developer  interests  to 
realize  those  desires. 

The  strategies  employed  by  the  housing 
subsidy  programs  reflect  the  liberal  commit- 


ment to  the  market  and  democratic  processes. 
The  design  of  the  section  235  and  236  pro- 
grams for  housing  were  a triumph  of  the 
"market-tinkering"  school.  The  private  market 
was  to  remam  the  major  mechanism  for  pro- 
ducing and  distributing  housing  services.  How- 
ever, poor  people  would  be  served  by  these 
programs  (as  they  were  not  by  the  FHA  pro- 
grams), because  the  Government  would  sub- 
sidize the  interest  payments  to  reduce  their 
rental  or  home  mortgage  payments.  The  pri- 
vate market  would  still  make  the  decisions  on 
the  design  and  location  of  the  project.  It  would 
still  build,  rent,  and  manage  the  project. 
Therefore,  the  private  market  would  take  czu'e 
of  the  quality  of  housing  services  poor  people 
would  receive.  By  providing  the  market  with 
the  proper  financial  incentives,  it  was  thought 
that  the  market  would  distribute  its  benefits 
to  the  poor  zis  well  as  to  the  middle  class. 

The  theory  behind  the  design  of  the  market 
subsidy  programs  predicted  that  poor  people 
would  receive  housing  services  similar  to  those 
received  by  the  middle  clziss.  The  theory  as- 
sumed that  the  outcome  of  the  hoiasing  market 
was  desirable  in  every  way  except  in  how  it 
discriminated  against  people  without  money. 
The  major  defect  of  the  private  market,  in  the 
eyes  of  the  program  designers,  was  that  it 
forced  poor  people  into  low- quality,  high- cost 
dwellings  that  consumed  an  unreasonable 
portion  of  a family’s  income.  Some  theorists 
of  that  era  also  were  concerned  about  the 
location  of  the  dwellings  of  low-income  people 
in  high-crime  and  low-amenity  arezis.  The 
major  principle  of  the  program  design,  how- 
ever, was  to  solve  the  problem  of  the  un- 
availability of  decent  units  and  the  high  cost 
of  housing  to  the  poor  family  (USDHUD  1968). 
Therefore,  the  process  rather  than  the  out- 
come was  the  most  important  element  of  the 
programs.  The  outcome  was  taken  for  granted. 

Program  Dimensions  and  Data  Systems 

Instead  of  being  interested  in  whom  the 
programs  were  serving,  what  housing  services 
(including  locations)  were  received  and  for 
what  cost,  HUD'S  emphasis  during  this  period 
wzis  on  the  process  of  the  production  of  hous- 
ing units.  It  was  assumed  that  if  monies  were 
made  available  to  the  private  market,  units 
could  be  built  and  rented  at  prices  poor  people 
could  afford,  and  the  private  market  would 
meet  the  housing  goals  of  this  country:  a de- 
cent, safe,  and  sanitary  home  for  each  indi- 
vidual and  each  family. 


86 


Accordingly,  the  set  of  indicators  used  to 
track  the  process  emphasized  the  steps  that 
HUD  staff  must  go  through  to  obligate  funds 
to  a developer  or  locality,  for  example,  num- 
bers of  units  started,  completed,  and  finally, 
occupied.  The  raw  count  of  units  produced, 
sometimes  superimposed  against  a dollar- 
per-unit  measure  for  the  different  housing 
programs,  has  been  the  major  management 
tool  used  by  HUD  to  ensure  that  hoasing 
services  were  being  delivered.  Processing 
times  were  used  less  successfully  than  with 
the  early  FHA  programs. 

However  straightforward  the  dimensions  of 
performance  and  their  indicators  were  thought 
to  be,  their  mezisurement  by  inhouse  auto- 
mated data  systems  was  another  matter  en- 
tirely. Because  Secretary  George  Romney 
(1968-72)  was  personally  interested  in  pro- 
duction of  the  subsidized  units,  his  manage- 
ment s3Tstem  successfully  produced  the  units 
using  haphazard  calculations  and  manual  data 
systems.  Later  Secretaries  not  as  concerned 
with  total  output  found  the  data  systems 
seriously  incomplete,  although  as  program 
planning  and  budgeting  (PPB)  finally  became 
established  within  HUD  (and  these  indicators 
were  used  for  decisionmaking),  the  process 
data  began  to  improve.  The  institutional- 
ization of  PPB  in  HUD  consisted  of  describing 
in  detail  the  internal  HUD  work  activities 
necessary  to  let  grants,  issue  mortgage  in- 
surance, or  produce  other  program  outputs.  In 
addition,  the  kinds  and  amount  of  HUD  staff 
needed  to  let  a grant  or  get  a unit  built  were 
specified  (work  mezisurement  standards).  This 
S3Tstem  enabled  HUD  managers  to  link  the  de- 
sired number  of  housing  units  to  be  produced 
with  number  of  staff  required  to  accomplish 
that  end.  Given  information  about  the  internal 
HUD  production  functions,  program  managers 
could  negotiate  realistic  performance  con- 
tracts with  the  next  highest  level  of  managers 
about  how  many  units  would  be  produced  at 
each  processing  stage.  These  contracts  pro- 
vided the  basis  of  a formal  internal  perform- 
ance indicators  system  that  linked  the  pro- 
gram staff  in  the  field  through  8 or  10  levels 
of  managers  to  the  Secretary  and  thence  to 
0MB  and  the  White  House. 


Perverse  Consequences  of  the  Process 
Approach 

Use  of  process  indicators  for  housing,  how- 
ever successful  they  might  have  been  to  en- 


sure production,  also  caused  the  full  range  of 
perverse  consequences  that  simplistic  or  pre- 
maturely restrictive  indicators  have  had  in 
other  social  service  areas.  Former  HUD  Sec- 
retary Romney's  famo\js  "pushout"  production 
years  of  1971  and  1972  to  maximize  con- 
struction starts  resulted  in  the  eqvially  famous 
defaults  of  1973  and  1974  (USDHUD  1974).  In 
the  effort  to  maximize  the  number  of  units 
produced  for  poor  people,  concentrations  of 
subsidized  housing  began  to  appear  in  the  Na- 
tion's inner  city  areas.  Not  only  did  this  put 
great  strains  on  a locality's  ability  to  provide 
physical  and  social  amenities  to  these  resi- 
dents, but  many  also  felt  that  HUD's  programs 
actually  promoted  segregation,  since  a large 
proportion  of  the  inner  city  poor  were  minor- 
ity members.  Subsequently,  a court  decision 
(the  Shannon  case)  in  1970  concluded  that 
HUD's  lack  of  policy  about  the  location  of 
subsidized  projects  had  contributed  to  the 
concentration  of  minorities  in  the  inner  city, 
which  violated  the  Civil  Rights  Acts  of  1964 
and  1968  (USDHUD  1974). 

At  the  same  time,  concern  was  raised  about 
the  quality  of  the  units  produced,  as  well  as 
the  ability  of  large  poor  families  to  maintain 
them.  Subsidized  multifamily  projects  were 
often  found  to  deteriorate  rapidly  in  appeeir- 
ance  from  middle-class  projects  to  rundown, 
ill- managed,  and  slumlike  dwellings  (Teitz  and 
Dodson  1975;  USDHUD  1974).  Finally,  a gen- 
eral feeling  persisted  that  Government  sub- 
sidies for  housing  were  not  being  equitably 
distributed  to  the  poorest  families. 

Awareness  was  growing  also  in  the  early 
seventies  that  the  market-tinkering  programs 
were  fine  for  the  mass  production  of  housing 
but  not  for  accomplishing  the  higher  goals  the 
theorists  thought  the  market  would  provide  for 
the  poor.  The  creation  of  poor  people  as 
another  interest  group  competing  in  our 
pluralist  tradition  did  not  abolish  poverty  nor 
redistribute  housing  resources  to  the  poorest 
population.  Nor,  more  specifically,  did  the 
provision  of  money  to  the  private  market  re- 
sult in  better  housing  services  for  the  poor.  In 
fact,  many  began  to  believe  that  the  chief 
beneficiaries  of  subsidized  housing  programs 
were  those  in  the  housing  indiistry.  Early 
feedback  on  the  steirtup  of  the  Great  Society 
programs,  in  terms  of  both  formal  evaluations 
and  media  reports,  made  it  clear  that  the 
process  goals  of  the  program  had  to  be  modi- 
fied if  the  programs  were  to  meet  the  expec- 
tations of  the  1960s  (National  Center  for 
Housing  Management  1973). 


87 


Outcome  Indicators  and  Social  Goals 
of  the  Seventies 

From  1970  onward,  new  programs  and  reg- 
ulations were  developed  to  regulate  the  mar- 
ket outcomes  of  assisted  hoiising  programs. 
Along  with  the  programs  came  new  manage- 
ment systenjs  to  hold  the  HUD  field  offices 
and  other  links  in  the  housing  production  chain 
accountable  for  program  and  social  outcomes. 
These  networks  represent  the  first  steps 
toward  an  outcome  indicator  system  for 
housing  programs  that  links  actions  by  various 
levels  of  HUD  to  the  outside  recipients  of 
funds. 

Regulations  as  Programs 

The  programs  of  the  seventies  are  basically 
add-on  regulations  to  the  preexisting  process 
programs  of  the  sixties.  Although  the  subsi- 
dized programs  of  the  1960s  were  suspended 
by  former  President  Nixon  in  January  1973, 
the  replacement  programs  enacted  in  the 
Housing  and  Community  Development  Act  of 
1974  were  essentially  still  market- tinkering 
programs.  Provisions  were  made  to  avoid  the 
default  problem  of  the  earlier  programs,  and 
strong  efforts  were  made  to  influence  selec- 
tion of  clients,  but  reliance  was  still  placed  on 
the  private  market  to  produce  and  distribute 
the  units. 

The  program  innovations  of  the  1970s  con- 
sisted of  changes  in  how  funds  for  housing 
were  allocated  from  Washington,  through  the 
HUD  offices,  to  the  localities  and  developers. 
In  addition,  changes  were  made  in  certain  af- 
firmative action  and  deconcentration  re- 
quirements that  housing  developers  had  to 
abide  by  to  ensure  socially  desirable  outcomes. 
As  with  the  other  social  service  programs, 
managers  were  interested  in  these  outcomes: 
persons  being  served,  type  and  extent  of  ben- 
efits received,  and  cost,  both  social  and  to 
HUD,  of  the  program.  Central  to  the  concern 
about  the  type  and  extent  of  benefits  was  the 
quality  of  the  unit  and  its  location-  including 
whether  the  provision  of  the  unit  weis  having  a 
racial  and  social  integrating  effect. 

Confusion  About  Outcomes 

Although  Congress  made  many  changes  to 
HUD  programs  in  the  Housing  and  Community 
Development  Act  of  1974  that  were  ostensibly 
concerned  about  program  outcome,  neither  the 


Act  nor  its  subsequent  enabling  regulations 
were  very  clear  about  what  the  outcomes 
ought  to  be.  In  fact,  the  community  develop- 
ment side  of  this  Act  combined  several  cate- 
gorical programs  into  a single  community  de- 
velopment block  grzint  (CDBG),  which  was 
designed  to  further  local,  not  Federal,  goals. 
In  many  instances,  the  desired  outcome  ap- 
peared to  be  less  a positive  state  than  a ces- 
sation of  a negative  state.  Programs  changes 
occurred  in  response  to  serious  problems  of 
financial  stability  in  housing  efforts  assisted 
with  Federal  funds,  of  inequities  in  the  dis- 
tribution of  resources,  and  of  the  tendency  of 
HUD  programs  to  "gild  the  ghetto”  and 
thereby  encourage  mobility  out  of  the  ghetto. 
However,  little  concrete  notion  was  found  as 
to  what  constituted  a positive  program 
outcome. 

Housing  Assistance  Plan  Requirement 

The  change  in  the  1974  Act  that  came  the 
closest  to  specifying  outcome  goals,  and  hence 
could  provide  the  basis  for  zin  outcome- 
oriented  performance  indicators  system,  was 
the  requirement  for  the  preparation  of  the 
housing  assistance  plan  (HAP)  by  all  com- 
munity development  block  grant  recipients. 
The  HAP  was  an  effort  to  centralize  planning 
for  all  the  assisted  or  subsidized  housing  pro- 
grams in  a particular  locality  and  to  provide  a 
guide  for  HDD's  allocation  of  subsidized 
housing  to  the  locality,  so  that  Federal  housing 
resources  would  match  local  needs  and  goals. 
The  requirement  was  also  an  effort  to 

• centralize  responsibility  for  meeting  local 
needs  in  city  hall  or  with  the  responsible 
chief  executive  of  the  area  and 

• weaken  private  market  control  on  deci- 
sions about  the  kinds  and  locations  of 
subsidized  housing  in  a locality 

Previously,  despite  many  efforts  by  the 
community  planning  and  development  side  of 
HUD  to  require  localities  to  plan  for  their 
housing  needs,  the  hoasing  side  of  HUD  (the 
old  FHA,  which  had  been  incorporated  into 
HUD  in  1968)  dealt  directly  with  developers. 
This  process  had  effectively  bypassed  any  lo- 
cal planning  mechanisms.  What  was  new  about 
the  HAP  was  that  it  was  to  be  used  by  HUD  in 
its  allocation  of  units  to  localities. 

The  HAP  enforces  local  plemning  relative  to 
developer  performance  by  first  analyzing 


88 


hovising  need  in  the  locality's  jurisdiction, 
subdivided  into  homeowner  and  renter,  elderly, 
small  and  large  family,  and  by  ethnicity  and 
other  demographic  characteristics.  Second, 
the  HAP  contains  a numerical  statement  of 
goals  for  the  locality,  which  are  to  be  based  on 
meeting  15  percent  of  the  housing  need  in  the 
categories  mentioned.  Congress  set  the  areas 
of  performance  of  the  locality,  and  HUD/ 
Central  Office  set  the  level  at  15  percent  of 
the  need.  In  addition  to  use  by  HUD  to  allo- 
cate housing  subsidies,  goals  provide  targets 
for  the  locality  to  use  in  deciding  how  much  of 
its  block-grant  allocation  to  spend  on  housing 
and  on  what  kind  of  housing.  HAP  goals  also 
are  to  be  used  by  HUD  to  judge  a locality’s 
performance  for  future  block-  grant  and  hous- 
ing funding. 

Problems  With  the  HAP  as  Framework  for 
an  Outcome  Indicator  System 

On  the  surface,  HAP  goals  seem  to  provide 
the  basis  for  zin  outcome-oriented  perform- 
ance indicator  system  that  could  hold  the 
locality  and  every  level  of  HUD  accountable 
for  delivering  housing  services  to  poor  people. 
Although  the  HAP  does  not  specify  a reporting 
system  and  measurement  method,  it  specifies 
indicators  of  performzince  which,  although 
part  of  an  imperfect  system,  will  certainly  be 
used  by  HUD  staff  to  judge  local  performance 
in  housing.  However,  a number  of  conceptual 
2ts  well  as  implementation  problems  must  be 
resolved  before  the  HAP  or  its  successor  can 
be  effectively  used  sls  part  of  a performance 
indicator  system.  These  problems  can  best  be 
discussed  by  organizing  them  around  the  four 
major  elements  of  an  effective  performance 
indicator  system: 

• Dimensions  to  be  used  as  performance 
criteria 

• Indicators  for  those  dimensions 

• Measurement  and  reporting  mechanisms, 
and 

• A s5rstem  for  using  the  indicators 

Problems  With  Identification  of  Dimensions 
of  Performance 

The  lack  of  clear  program  objectives  in- 
hibits the  development  of  useful  performance 
indicators  in  any  program.  The  U.S.  General 


Accounting  Office  (1978)  noted  this  deficiency 
in  an  overall  assessment  of  HUD's  evaluation 
system.  In  some  respects,  the  HAP  is  a series 
of  indicators  without  the  dimensions.  It  fo- 
cuses on  serving  the  needy  population,  which  is 
only  one  of  three  major  program  outcomes 
that  should  be  part  of  an  outcome  indicator 
system,  and  does  not  do  that  well. 

Beneficiaries  Specified  Imperfectly- 
Although  the  HAP  includes  a mechanism  for 
setting  goals  about  who  should  be  served,  it 
does  so  in  an  imperfect  manner.  The  HAP 
specifies  the  number  of  needy  minority  fam- 
ilies, both  large  and  small,  eis  well  as  elderly, 
who  are  homeowners,  for  example,  and  are  in 
need  of  assistance.  Goals  developed  from 
these  needs  do  not  adequately  address  equity 
issues,  nor  do  they  deal  adequately  with  the 
goal  of  deconcentration  of  minorities  called 
for  by  the  housing  legislation.  In  addition,  the 
indirect  benefits  of  housing  rank  among  the 
most  important  reasons  for  the  assisted  hous- 
ing program,  and  the  beneficiaries  are  not 
always  poor  people.  The  HAP  completely  by- 
pzissed  this  issue. 

Housing  Benefits  Not  Specified  in  the 
HAP-  To  determine  whether  a locality  (and 
by  extension,  the  program  on  an  aggregate 
basis)  is  meeting  the  housing  needs  of  its  pop- 
ulation, it  is  important  not  only  to  track  who 
is  being  served,  but  to  measure  how  they  are 
being  served,  and  whether  the  provision  of 
housing  services  resulted  in  an  improvement  in 
the  recipients'  quality  of  life.  Part  of  this  in- 
dicator area  should  concern  the  quality  of  the 
unit  and  its  "fit"  to  the  needs  of  the  family, 
while  the  other  part  concerns  the  neighbor- 
hood in  which  the  unit  is  located  and  the  in- 
direct benefits  (such  as  better  schooling)  the 
location  provides  to  the  family.  Good  meas- 
ures in  this  area  for  the  population  as  a whole 
would  of  course  also  provide  a sound  basis  for 
the  development  of  improved  goals.  The  HAP 
does  not  treat  this  subject. 

Cost  of  Service  Not  in  the  HAP. — The  third 
area  not  provided  for  by  the  HAP  involves 
cost.  Inclusion  of  cost  figures  must  be  done 
carefully  because  of  the  enormous  vziriation  in 
housing  prices  across  the  country.  Yet  one  of 
the  most  frequent  criticisms  of  Government 
housing  programs  is  that  they  are  more  ex- 
pensive than  the  private  market  would  pro- 
vide, without  balancing  benefits.  Use  of  cost 
as  a performance  indicator  at  the  local 


89 


level  and  between  programs  within  HUD  could 
help  insure  that  HUD  at  leaist  gets  its  money's 
worth.  Costs  of  the  service,  of  HUD  staff  in- 
put, eind  of  local  provider  staff  input  should  be 
part  of  the  system.  Indirect  costs,  such  as  loss 
of  revenues  to  the  Government  through  tax 
shelters  and  tax-exempt  bonds,  should  also  be 
included. 


Problems  With  Specification  of  Indicators 
and  Measurement  Methods 

Indicators  of  performance  to  use  in  the  HAP 
process  are  developed  primarily  by  the  local 
HUD  representative  who  monitors  the  CDBG 
program.  Because  the  HAP  does  not  contain 
indicators  of  performance  as  they  are  nor- 
mally understood,  assessments  are  situation- 
specific.  For  example,  providing  a family  with 
a $2,000  lozin  for  cosmetic  repairs  may  count 
2is  much  as  the  full  renovation  of  a unit,  which 
involves  considerably  more  money.  Or,  if  the 
city  hzis  never  provided  zissisted  housing  for  its 
low-income  population,  actions  such  as  ac- 
quiring a site  will  count  as  moving  the  city 
toward  meeting  its  HAP  goals.  Different  HUD 
staff  may  come  to  different  decisions  about 
efforts  mzide  by  a locality  to  house  its  poor. 
Data  on  individual  localities'  performances  are 
not  aggregated  to  make  comparative  judg- 
ments nationally  and  regionally  as  part  of  an 
indicator  system. 

Further  analysis  is  needed  to  conceptualize 
standard  indicators  of  benefit  as  well  as 
measurement  methods  for  them.  Is  a dollar  of 
benefit  the  replacement  value  of  the  addition 
to  the  unit?  Is  it  the  cost  of  the  euldition?  Is  it 
the  dollar  value  to  the  direct  beneficiary?  llie 
indirect  beneficiary?  What  about  benefits 
other  than  the  housing  units  themselves,  such 
as  the  racial  integration  of  a suburb?  Thought 
also  needs  to  be  given  to  interjurisdictional 
comparisons.  Is  a dollar  of  benefit  in  the 
South,  where  it  is  cheaper  to  build,  worth  a 
dollar  of  benefit  in  the  Northeast  and  West, 
where  building  costs  are  high?  What  is  an  ap- 
propriate indicator  of  performance  for  a 
"reluctzint  suburb"  doing  an  elderly  housing 
project  and  avoiding  meeting  its  "poor  family 
needs"?  And  how  would  this  be  measured? 
Thought  should  be  given  also  to  the  range  of 
actions  a city  can  take  as  it  moves  from 
planning  for  zissisted  housing  to  actually  pro- 
viding it,  as  well  zis  how  to  meeisure  effort  on 
the  part  of  a city.  For  example,  acquiring  a 
site  for  assisted  housing  is  much  more  difficult 
in  California,  with  its  active  construction 


market  and  many  zoning  restrictions,  tham  it  is 
in  the  Midwest. 

Problems  With  Data  Collection 
Mechanisms 

Studies  within  HUD  (Elmer  1979)  and  by  a 
prominent  national  research  organization  (The 
Urban  Institute  1979)  indicate  that  system 
design  and  implementation  problems  riddle  the 
Department's  data  systems  on  beneficiauy 
characteristics.  One  report  noted  that  not  only 
are  the  data  in  the  system  incomplete,  but  the 
lack  appears  to  be  S3rstematically  linked  to 
variables  of  interest  at  the  local  level,  such  as 
type  of  population  served.  Large  PHA's  in  the 
center  cities  serving  predominantly  minority 
populations  fail  to  report  more  frequently  than 
do  their  suburban  counterparts.  In  addition, 
inconsistent  and  inaccurate  data  collection 
procedures  at  the  local  level  cast  doubt  upon 
the  data  actually  contained  in  the  systems. 
Although  both  reports  noted  that,  when  prop- 
erly weighted,  the  data  at  the  national  level 
came  within  5 to  10  percentage  points  of  in- 
formation about  the  beneficiaries  obtained 
from  other  and  presumably  more  reliable 
sources,  the  same  reliability  was  not  present 
when  the  information  wbls  disaggregated  below 
the  national  level.  Therefore,  these  data  sys- 
tems could  not  provide  performance  indicators 
at  the  local  or  regional  level. 

For  performance  indicators  that  are  not  so 
easily  qualifiable  and  when  a subjective  as- 
sessment needs  to  be  made,  a different  ap- 
proach to  data  collection  is  required,  partic- 
ularly if  the  results  are  to  be  aggregated. 
HUD/Central  Office  relies  on  indepth  brief- 
ings by  field  staff  about  a city's  performance 
so  that  it  can  assess  progress  formally.  These 
briefings  are  usually  conducted  only  once  a 
year,  however,  and  the  results  are  not  aggre- 
gated and  published.  Although  this  process  is 
intended  for  internal  management  purposes, 
with  some  systemization  the  results  could  be 
useful  also  to  members  of  Congress,  local  of- 
ficials, and  others. 

Problems  in  Using  Outcome  Indicators 

Finally,  there  are  serious  problems  in  the 
way  HAP  information  is  being  used  to  make 
program  decisions,  zis  the  outcome-oriented 
HAP  process  exists  side  by  side  with  the  pro- 
cess system  of  an  earlier  era.  Although  the 
internal  S3^tem  for  using  outcome  indicators  is 
strong,  the  institutionalization  of  managerial 
accountability  for  quantifiable  process  in- 


90 


dicators  throughout  the  HUD  organization 
presents  a conflicting  force.  The  HAP,  which 
is  used  to  allocate  hoasing  funds  to  localities, 
to  evaluate  the  progress  of  the  localities  in 
meeting  housing  needs,  and  eventually  to 
mezisure  HDD's  progress,  offers  a different 
perspective  from  the  monthly  system  of  face- 
to-face  meetings  among  high-level  officials  to 
etssess  blame  for  process  inadequacies. 

The  major  deficiency  of  a S5rstem  for  using 
the  outcome  indicators  lies  not  so  much  within 
HUD  but  in  the  larger  housing  environment 
outside  HUD.  The  past  15  years  have  seen  a 
continuing  emphasis  on  the  devolution  of  de- 
cisionmaking about  Federal  housing  assistance 
to  locally  elected  officials- -rather  than  the 
private  housing  market  or  Federal  officials. 
However,  the  design  of  HDD's  performance 
indicator  systems  has  not  matched  this  em- 
phasis. Instead,  they  have  been  primarily  Fed- 
eral systems,  for  Federal  purposes.  It  may 
reasonably  be  argued  that  If  other  major  de- 
cisionmakers such  as  Congress  and  locally 
elected  officials  want  performance  indicator 
systems,  they  should  initiate  their  own  efforts. 
However,  a convincing  case  can  be  made  for 
HUD,  in  its  capacity-building  rule,  to  develop 
and  implement  indicator  s5rstems  that  can  be 
used  by  these  important  sectors  as  well. 

Perverse  Consequences  of  Using  Both 
Process  and  Outcome  Indicators 

The  major  usefulness  of  an  indicator  sjrstem 
appears  to  be  to  decentralize  control  over 
procedural  concerns  to  the  level  of  the  agency 
closest  to  the  target  of  service,  while  main- 
taining accountability  to  the  public  on  what 
the  program  accomplishes.  Certainly,  if  the 
entire  causal  chain  of  a program  is  known, 
from  inputs  to  outputs  (process  indicators), 
and  from  outputs  to  impacts  (outcome  indi- 
cators), there  need  be  no  tension  in  using  both 
sets  of  indicators.  However,  when  little  is 
known  about  program  implementation  and  its 
effects,  certain  discrepancies  may  occur  as 
some  program  managers  axe  held  accountable 
for  one  set  of  indicators  and  others  for 
another  set. 

Within  HUD,  these  two  systems  of  process 
zmd  outcome  indicators  have  proved  to  be  in- 
compatible. The  problems  have  been  pre- 
dictable. HUD'S  outcome  indicators  are  based 
on  wishes,  not  on  observed  empirical  evidence 
of  the  results  reached  by  different  housing 
strategies.  The  process  indicators,  used  to 
manage  month-to-month  HUD  activities, 
result  in  certain  activities  (i.e.,  production  of 


units)  taking  place  without  regard  to  their 
effect  on  outcome  goals.  Because  process  in- 
dicators such  as  construction  starts  and  occu- 
pancy rates  can  be  monitored  more  frequently 
than  the  impact  indicators,  and  because  they 
can  be  aggregated  across  projects  and  local- 
ities, they  are  easier  to  use  as  criteria  for 
program  performance.  This  had  led  to  a dom- 
inance of  the  process  indicators  for 
day-to-day  management  within  HUD. 

The  dominance  of  process  indicators  hzis 
several  unintended  consequences.  First,  the 
HAP  goals  are  to  govern  the  mix  of  program 
funds  that  a locality  receives.  The  Office  of 
Management  and  Budget,  however,  has  made 
many  attempts  to  change  the  national  mix 
between  new  construction  units  and  rent  sup- 
plement units,  in  order  to  save  money  by  in- 
creasing the  proportion  of  the  less  expensive 
program.  Tn  addition,  although  HUD  has  fol- 
lowed the  HAP  mix  for  unit  goals  in  its  allo- 
cation process  at  the  beginning  of  the  fiscal 
year,  by  the  end  of  the  fiscal  year  the  HAP 
goals  are  all  but  forgotten  by  the  HUD  field 
offices  in  the  effort  to  obligate  all  appro- 
priated moneys. 

In  the  early  days  of  the  HAP  process,  many 
HUD  program  staff  in  the  field  offices  figured 
out  their  locality's  statistical  fair  share  of 
subsidized  housing  units  (according  to  census 
data  about  poverty  populations  zind  housing 
conditions)  and  then  told  the  locality  what  to 
put  in  the  goals  portion  of  the  HAP  (Vitek  et 
al.  1977).  Administrative  prerogative  replaced 
the  HAP  goal- setting  process,  and  the  ezisiest 
units  to  fund  got  the  money  regardless  of  the 
local  goals.  For  example,  housing  projects  for 
the  elderly  were  funded  in  greater  proportion 
to  their  need  than  family  housing  because  de- 
velopers found  these  projects  easier  to  pro- 
duce and  manage,  and  often  waited  until  the 
end  of  the  fiscal  year  to  submit  proposals  so 
that  HUD  would  be  forced  to  accept  the  el- 
derly projects  in  order  to  obligate  all  fiscal 
year  funds.  In  addition,  more  rent  supplement 
units  (existing  units)  are  allocated  than 
planned,  because  no  construction  period  is 
necessary  and  the  overall  occupancy  figures 
for  the  Section  8 program  are  boosted. 

Suburban  public  housing  agencies  (PHAs), 
which  traditionally  do  not  serve  inner  city 
residents,  also  appear  to  receive  more  than 
their  fair  share  of  Section  8 funds.  These 
agencies  are,  for  the  most  part,  better  man- 
aged than  their  inner  city  counterparts,  and 
when  the  yearend  crunch  occurs,  they  are  in  a 
more  favorable  position  to  quickly  absorb  ex- 
tra funds  (Elmer  1979).  The  heavy  emphasis  on 


91 


process  goals  (i.e.,  production  goals)  from 
Congress  and  the  Secretary  down  through  the 
ranks  of  HUD  has  resulted  in  not  only  a dis- 
regard of  local  goals  in  the  suballocation  of 
units  by  HUD,  but  a cynical  attitude  about  the 
local  goal- setting  process  by  most  of  those 
associated  with  it  at  the  local  level  (deNeuf- 
ville  1981). 

Along  with  internal  departmental  tension 
about  performance  indicators,  serious  prob- 
lems revolve  around  who  outside  the  Depart- 
ment can  be  held  accountable  for  perform- 
ance. The  lack  of  a single  administrative  body 
at  the  local  level  for  all  of  HUD's  hoasing 
programs  is  a major  obstacle  to  the  estab- 
lishment of  2in  outcome- oriented  performance 
indicator  system.  Formally,  the  houising  as- 
sistance plan  is  submitted  by  the  chief  execu- 
tive of  the  locality  receiving  HUD  funds,  who 
is  held  accountable  for  reaching  its  goals 
(insofar  as  HUD  can  require  this).  In  practice, 
however,  the  funds  for  housing  go  to  a variety 
of  actors,  some  reporting  to  the  chief  execu- 
tive and  others  not. 

Community  development  block-grant  funds, 
which  HUD  administers,  can  be  spent  on  the 
rehabilitation  of  housing  as  well  as  on  other 
capitol  expenditures,  such  ais  sidewalks  and 
parks.  In  practice,  about  40  to  60  percent  of 
these  fun^  are  spent  on  rehabilitation.  The 
housing  portion  of  these  block-grant  funds  is 
to  be  planned  in  the  HAP,  and  the  funds  are 
eidministered  by  a variety  of  agencies,  de- 
pending upon  the  locality.  In  most  parts  of  the 
country,  however,  these  funds  do  not  go  to  the 
same  agency  that  manages  housing  programs. 
PHA  may  in  fact  receive  block-grant  funds 
from  the  local  block-grant  agency,  but  it  also 
receives  funds  directly  from  HUD  for  public 
housing  programs  and  for  the  Section  8 ex- 
isting housing  certificate  program  (a  type  of 
rent  supplement  program).  In  addition,  HUD 
funds  for  housing  go  directly  to  developers, 
and  although  the  chief  executive  can  register 
a protest  with  HUD  (the  Section  213  review)  if 
he  or  she  feels  a project  about  to  be  approved 
by  HUD  is  not  consistent  with  the  HAP,  this 
rarely  occurs.  In  housing,  the  lack  of  ac- 
countability that  performance  indicator  sys- 
tems are  expected  to  counter  has  as  much  to 
do  with  who  is  accountable  as  with  the  system 
itself. 


Conclusions 

President  Reagan's  preference  for  local 
control  has  continued  the  emphasis  on  internal 


management  or  process  indicators  that  was  in 
favor  at  the  time  of  his  election.  If  the  agency 
intends  to  shift  emphasis  to  monitoring  and 
evaluating  the  outcome  of  its  programs,  a 
comprehensive  analysis  of  past  research  on  the 
outcomes  of  the  housing  program  should  be 
mtide  to  determine  gaps  in  empirical  knowl- 
edge, gaps  in  theory,  problems  in  the  devel- 
opment of  normative  criteria,  amd  problems  of 
indicator  specification  and  measurement. 
Eventually,  proponents  of  local  control  will 
have  to  justify  what  difference  HUD  housing 
programs  make  to  a locality,  and  without 
outcome-oriented  data,  the  programs  may  fall 
victim  to  cost-conscious  budget  cutters. 

It  should  be  noted  also  that,  until  further 
descriptive  empirical  research  has  been  done, 
it  is  foolhardy  to  arbitrarily  pick  certain  nor- 
mative criteria  to  hold  localities  and  HUD 
program  managers  accountable  for,  particu- 
larly when  these  criteria  for  a given  situation 
may  differ  between  HUD  field  offices.  Such  is 
the  potential  situation  with  regard  to  the 
HAP,  which  contains  a plethora  of  goals,  many 
of  them  not  achieveable  with  available  pro- 
grams. In  addition,  many  of  the  goals  about 
the  number  of  families  to  be  served  in  dif- 
ferent demographic  categories  are  bcised  on 
data  of  doubtful  accuracy  (deNeufville  1981). 
Further  specification  of  reasonable  goals  and 
the  measurement  procedures  are  needed  here. 

The  eidoption  of  performance  indicators  be- 
fore more  program  theory  has  been  well  de- 
veloped, or  outcomes  have  been  empirically 
validated,  will  result  in  premature  programing 
(Landau  et  al.  1978).  Managers  will  continue  to 
wonder  why  the  housing  programs  produce 
unintended  and  counterproductive  conse- 
quences. Yet  most  of  HUD's  housing  programs 
could  lend  themselves  to  an  outcome  indicator 
system  once  the  required  theoretical  work  and 
empirical  research  hzis  been  done. 

Although  it  seems  appropriate  zind  feasible 
to  develop  and  implement  outcome  indicator 
systems  for  housing  programs,  the  time  it 
takes  to  develop  and  institutionalize  a data 
system,  and  the  lack  of  demand  for  such  an 
indicator  system  within  HUD,  make  it  seem 
likely  that  a full-scale,  comprehensive  indi- 
cator system  will  not  be  developed  or  adopted 
before  5 or  10  years  from  now.  In  the  mean- 
time, HUD  would  do  well  to  ensure  that  all  its 
programs  are  evaluated  formally,  using  evzd- 
uation  designs  that  do  not  necessarily  follow 
the  classic  experimental  design  or  are  quasi- 
experimental  models,  but  which  are  oriented 
toward  the  description  of  the  process  and 
outcome  aind  the  development  of  both  de- 


92 


scriptive  and  normative  theory.  Such  evalua- 
tions will  help  to  provide  the  information 
necessary  for  an  outcome-oriented  indicators 
system. 

HUD’S  experience  with  indicator  systems  is 
probably  not  unique  among  social  programs. 
The  tangibility  of  its  outputs,  along  with  con- 
siderable involvement  by  HUD  in  the  produc- 
tion process,  heis  made  it  easier  than  for  men- 
tal health  or  education  programs  to  establish 
process  indicators  for  program  management. 
Whereas  many  local  agencies  in  the  softer 
social  services  are  just  now  developing  sys- 
tems to  monitor  clients  served  and  to  establish 
the  type  of  service  received,  in  HUD  this  has 
already  been  done.  However,  HUD's  confusion 
and  difficulty  with  the  next  step,  the  outcome 
indicators  system,  could  stand  as  a sober 
warning  to  other  programs  that  dare  to  ven- 
ture down  this  obstacle-strewn  path.  An  out- 
come-oriented indicator  sj^tem  should  be  the 
result  of  careful  specification  of  program  in- 
terrelationships and  careful  observation  of  the 
range  of  empirical  outcomes.  To  establish  an 
outcome-oriented  system  without  a clear  un- 
derstanding of  what  the  program  can  realis- 
tically accomplish  is  folly.  It  will  result  in 
massive  expenditures  of  money  for  lanused  and 
inaccurate  data  systems,  disrespect  for  man- 
agers who  piish  an  unrealistic  system,  and  a 
wziste  of  human  resources  in  this  era  when 
they  are  in  short  supply  for  social  programs. 

References 

Aziron,  H.  Shelters  and  Subsidies.  Washington, 
D.C.:  Brookings  Institution,  1972. 

Abrzims,  0.  The  City  Is  the  Frontier.  New 
York:  Harper  Books,  1965. 

Davidoff,  P.  Advocacy  planning.  Journal  of  the 
American  Institute  of  Planning  1965. 
DeNeufville,  J.  "Data  Capacity  at  the  Local 
Level  in  the  Community  Development  Block 
Graint  Progreim."  Unpublished  manuscript, 
1981. 


Elmer,  V.  "Data  Qviality  Analysis:  Section  8 
Tenant  Characteristics."  San  Francisco:  U.S. 
Department  of  Hoiasing  and  Urban  Devel- 
opment, Region  IX,  1979. 

Laindau,  M.,  et  al.  "To  Manage  Is  Not  to  Con- 
trol." Institute  of  Government  Studies, 
University  of  California  at  Berkeley.  Un- 
published manuscript,  1978. 

Lowi,  T.  The  End  of  Liberalism.  Boston: 
Colophon  Press,  1968. 

National  Center  for  Housing  Management,  Inc. 
Task  Force  on  Improving  the  Operation  of 
Federally  Insured  or  Financed  Housing  Pro- 
grams. Washington,  D.C.:  the  Center,  1973. 

Teitz,  M.,  and  Dodson,  R.  Evaluation  Report: 
Multifamily  Failures.  (Berkeley,  Planning 
Associates.  Berkeley  Calif.)  Waishington, 
D.C.:  U.S.  Department  of  Housing  and  Ur- 
ban Development,  1975. 

The  Urban  l^titute.  "Analj^is  of  HUD  Data 

' (SHACO)  Tapes."  Memo  to  Carolyn  McFar- 
land of  HUD.  Washington,  D.C.:  the  Insti- 
tute, 1979. 

Thompson,  J.D.  Organizations  in  Action.  New 
York:  McGraw-Hill,  1967. 

U.S.  Department  of  Housing  and  Urban  De- 
velopment. The  President’s  Commission  on 
Housing.  A Decent  Home.  Washington,  D.C.: 
USDHUD,  1968. 

U.S.  Department  of  Housing  and  Urban  De- 
velopment. Chapter  4 (1968),  National 

Housing  Policy  Review.  In:  Housing  in  the 
Seventies.  Washington,  D.C.:  USDHUD,  1974. 

U.S.  General  Accounting  Office.  "HUD’s 
Evaluation  System— An  Assessment."  PAD- 
78-44.  Washington,  D.C.:  GAO,  1978. 

Vitek,  T.;  Christensen,  K.;  and  Tietz,  M.  Vol- 
ume 1,  The  local  HAP  process.  In:  Evaluation 
of  Housing  Assistance  Plans  in  Meeting  the 
Statutory  Objectives  of  Linking  Housing  and 
Community  Development.  (Five  volumes, 
1977-78,  M.  Tietz  Project  Director.)  Berke- 
ley, Calif.:  Berkeley  Planning  Associates, 
1977. 


93 


Developing  Performance  Standards  and  Measures  in  Vocational  Rehabilitation 


Susan  Stoddard,  Ph.D. 

Berkeley  Planning  Associates 
Berkeley,  Ceilifomia 


Tn  the  Federal- State  vocational  rehabili- 
tation (VR)  program,  there  is  a long  tradition 
in  the  use  of  a single  outcome  me2isure— 
employment  of  a disabled  client  60  days  after 
service  and  placement-  -as  the  indicator  of 
progreim  success.  An  effort  to  develop  more 
responsive  measiores  was  spurred  by  1973 
legislation  that  called  for  the  development  of 
evaluation  standards  for  the  program.  Table  1 
shows  the  program  standards  zind  data  ele- 
ments in  the  recommended  system.  This 
chapter  traces  the  history  of  the  system’s 
technical  development. 


Background 

The  Rehabilitation  Services  Administra- 
tion's (RSA)  vocational  rehabilitation  program 
provides  resources  to  disabled  persons  with 
vocational  potential.  The  VR  program  became 
law  in  1920;  passage  of  the  bill  for  civilian 
vocational  rehabilitation  was  eissisted  by  the 
compelling  economic  argument  that  a citizen 
able  to  work  and  support  himself  or  herself 
was  preferable,  in  terms  of  the  national  wel- 
fzire,  to  a person  who  because  of  disability  was 
dependent  upon  public  support.  Initially,  the 
legislation  was  concerned  with  providing  the 
physically  disabled  with  medical  services  that 
would  enable  them  to  find  jobs.  In  subsequent 
amendments,  the  scope  of  eligibility  and 
services  was  expanded  to  include  services  to 
the  family  of  the  handicapped  and  to  persons 
with  psychological  disorder,  alcoholism,  zind 
drug  abuse.  The  Rehabilitation  Act  of  1973 
included  a mandate  to  serve  the  severely 
disabled,  those  with  the  most  handicapping 
conditions  in  need  of  intensive  services. 

Consistent  with  the  historical  empheisis  on 
employment,  the  success  of  service  to  a client 
has  been  mezisured  by  whether  the  client  is 
"closed  rehabilitated,"  that  is,  placed  in  a 


work  situation  and  in  the  job  60  dajrs  after 
placement.  Competitive  employment  has  been 
the  favored  outcome,  but  sheltered  employ- 
ment and  homemaker  or  unpaid  family  worker 
status  may  also  be  regarded  as  successes,  as 
performance  of  these  roles  may  free  other 
family  members  to  enter  the  work  force. 

The  1973  Act  also  contained  a provision 
calling  for  development  and  use  of  program 
performance  stan^rds  [Public  Law  93-112, 
Section  401  (3)(6)].  As  the  program  shifted  its 
priority  to  serve  the  more  severely  disabled, 
successful  placements-  -particularly  in  com- 
petitive employment— became  more  difficult 
to  achieve.  The  program  confronted  infla- 
tionary costs  at  a time  when  its  meaisure  of 
productivity  reflected  less  effectiveness.  The 
RSA's  evaluation  office  saw  the  requirement 
for  standards  as  a vehicle  for  developing  al- 
ternative ways  to  measure  program  perform- 
ance and  describe  achievement. 


Program  Structure 

The  VR  program  is  administered  by  the 
States  with  Federal  and  State  funds.  RSA,  the 
Federal  funding  agency,  is  concerned  primarily 
with  broad  policy  goals  for  rehabilitation,  witli 
ensuring  compliance  with  the  law,  and  with 
providing  State  VR  agencies  with  technical 
support  and  resources.  The  more  than  80  State 
VR  agencies  delivering  services  have  their  own 
planning  and  evaluation  systems. 

The  steps,  or  "statistical  reporting  system 
caselozid  statuses,"  of  the  rehabilitation  proc- 
ess are  outlined  in  the  Federal  Rehabilitation 
Services  Manual  zind  are  therefore  standard- 
ized {Federal  Register  1975).  This  process 
standardization  hzis  lent  itself  to  the  devel- 
opment of  an  RSA  data  bzise  on  all  VR  agen- 
cies, incorporating  approximately  1 million 
client  records  annually  (Abt  Associates  1981). 


94 


The  s3rstem  inclvides  specific  designation  for 
the  outcome  of  any  client’s  program.  Clients 
who  are  accepted  for  the  process  but  do  not 
achieve  placement  in  suitable  employment  are 
designated  as  status  30  if  they  leave  the  pro- 
gram before  a written  plan  is  initiated  and 
statas  28  if  after.  A status  26  indicates  that 
the  client  has  completed  all  steps  in  the  re- 
habilitation process  and  h«is  been  determined 
to  be  suitably  employed  for  a minimiun  of  60 
days. 

Although  competitive  emplojnnent  is  the 
most  highly  valued  goal,  the  VR  program  does 
recognize  less  than  competitive  placement 
outcomes-  homemakers,  unpaid  family  work- 
ers, workers  in  sheltered  employment- -in  its 
26  category.  The  number  of  rehabilitations  or 
successful  placements  hzis  been  used  as  the 
single  measure  of  VR  program  performance 
since  the  beginning  of  the  program  in  1920. 
When  awzu'ded  by  the  counselor  at  the  close  of 
a case,  status  26  is  a measure  of  client 
achievement  (emplojmient).  In  rehabilitation 
statistics,  the  mezisure  becomes  an  indicator 
of  counselor  performance,  district  perform- 
ance, and  State  performance.  RSA  has  pub- 
lished annual  ceiseload  statistics  summarizing 
CEiselozid  activity  and  program  performance. 
Measures  include  State  by  State  and  national 
counts  of  the  number  rehabilitated  (291,200  in 
1977)  eind  rehabilitation  rate  or  the  proportion 
of  closures  that  are  26s.  Caseload  statistics 
include  State  rankings  for  these  measures,  in 
which  all  26s  are  counted  equally.  That  is,  the 
number  of  rehabilitations  includes,  but  is  not 
restricted  to,  those  individuals  placed  in  the 
competitive  labor  market. 

Counselors,  consumer  groups,  and  rehabil- 
itation evaluators  have  often  ejq)ressed  con- 
cern over  the  use  of  this  single  measure  of 
success  and  argued  that  clients  may  be  ad- 
versely affected  if  counselor  decisions  are 
influenced  by  performance  status  26  alone. 
One  concern  is  that  creaming  or  choosing  only 
the  ceises  that  are  easiest  to  rehabilitate, 
would  occur,  especially  if  financial  reward 
were  ever  to  be  associated  with  such  per- 
formance. A second  distortion  might  occur 
with  counselors  setting  a less- than- optimal 
plan  for  placement  goal  or  services  to  reduce 
risk,  for  example,  to  aim  at  easier  noncom- 
petitive outcomes,  since  all  successful  reha- 
bilitations are  counted  as  equal. 

RSA  has  limited  use  of  this  performance 
level  statistic  to  description  and  has  not  used 
it  to  allocate  funds.  Since  1954,  the  allocation 
of  rehabilitation  funds  among  States  has  been 
established  by  the  Hill-Burton  formula,  an 


algebraic  equation  that  distributes  funds  on 
the  basis  of  population,  heavily  weighted  by 
per  capita  income  (Ridge  1972a).  Performance 
by  the  State  does  not  earn  Federal  dollars  for 
the  State.  The  1973  Act,  in  calling  for  Federal 
standards,  expressed  the  possibility  and  even 
the  desirability  of  such  a linkage.  Because  the 
simple  counting  of  rehabilitations  concealed 
differences  in  State  programs  and  caseloads, 
more  mezisures  of  State  performance  were 
sought.  The  Act  provided  the  impetus  for  de- 
veloping more  complex  measures  and  reporting 
of  State  performance.  It  was  hoped  that  the 
Federal  requirement  would  lead  States  to 
adopt  the  new  measures  for  their  own  internal 
management  and  allocation  of  funds. 


The  New  Measures:  Round  One 

The  first  program  performance  standards 
published  under  the  1973  Act  were  defined  by 
RSA  and  reviewed  by  members  of  the  Council 
of  State  Administrators  of  Vocational  Reha- 
bilitation (CSAVR).  Because  of  the  tight  time 
frame  between  the  legislative  mandate  and 
the  scheduled  time  for  announcement,  how- 
ever, these  standards  and  their  mezisures  were 
not  tested  with  State  data;  debate  focused  on 
conceptual  issues  and  on  known  results  from 
previous  studies  or  State  experience. 

The  performance  of  each  State  agency  was 
to  be  compared  against  the  performance  of  all 
State  agencies;  States  would  learn  about  their 
comparative  performance  after  each  State 
result  was  included  in  the  standards  analysis. 
Separate  performance  levels  would  be  set  for 
agencies  that  served  only  the  blind. 

The  standards  were  as  follows: 

1.  To  ensure  that  the  rehabilitation  pro- 
gram is  serving  the  eligible  disabled 
population  and  that  these  services  are 
provided  in  an  equitable  manner 

2.  To  ensure  that  rehabilitated  clients  are 
placed  in  gainful  employment  suitable  to 
their  capabilities 

3.  To  ensure  that  undue  delays  are  avoided 
in  providing  clients  with  VR  services 

4.  To  ensure  that  available  resources  are 
utilized  to  achieve  maximum  operational 
efficiency 

5.  To  ensure  that  manageable-sized  case- 
loads are  maintained 


95 


Table  1.  Vocational  Rehabilitation  progr2un  standards  and  data  elements: 

Final  recommendations 


Performance  st2indards  and  data  elements 

1.  Coverage 

VR  shall  serve  the  maximum  proportion  of  the  potentially  eligible  target  population,  subject 
to  the  level  of  Federal  program  funding  and  priorities  among  clients. 

i.  Clients  served  per  100,000  population 

ii.  Percentage  severely  disabled  served 

2.  Cost-effectiveness  and  benefit- cost  return 

The  VR  program  shall  use  resources  in  a cost-effective  manner  and  show  a positive  return  to 
society  of  investment  in  vocational  rehabilitation  of  disabled  clients. 

i.  Expenditures  per  competitively  employed  closure 

ii.  Expenditure  per  26  closure 

iii.  Ratio  of  total  VR  benefits  to  total  VR  costs  (benefit- cost  ratio) 

iv.  Total  net  benefit  from  VR  services  (discounted  net  present  value) 

3.  RehabUitatian  rate 

VR  shall  maximize  the  number  and  proportion  of  clients  accepted  for  services  who  are  suc- 
cessfully rehabilitated,  subject  to  the  meeting  of  other  standards. 

i.  Percentage  26  closures 

ii.  Annual  change  in  number  of  26  closures 

4.  Economic  independence 

Rehabilitated  clients  shall  evidence  economic  independence. 

i.  Percentage  26  closures  with  weekly  earnings  at/above  Federal  minimum  wage 

ii.  Comparison  of  earnings  of  competitively  employed  26  closures  to  earnings  of  em- 
ployees in  State 

5.  Gainful  activity 

There  shall  be  maximum  placement  of  rehabilitated  clients  into  competitive  employment. 
Noncompetitive  closures  shall  represent  an  improvement  in  gainful  activity  for  the  client. 

i.  Percentage  26  closures  competitively  employed 

ii.  Percentage  competitively  employed  26  closures  with  hourly  earnings  at  above  Federal 
minimum  wage 

iii.  Percentage  noncompetitively  employed  26  closvires  showing  improvement  in  function 
and  life  status  (implement  after  FAI/LSl  pretest) 

6.  Client  change 

Rehabilitated  clients  shall  evidence  vocational  gains. 

i.  Comparison  of  earnings  before  and  after  VR  services 

ii.  (In  a^ition,  changes  in  other  statuses  and  functioning  ability,  when  such  measures 
become  available) 


96 


Table  1.  Vocational  Rehabilitation  program  standards  and  data  elements: 
Final  recommen^tions  (continued) 


7.  Retention 

Rehabilitated  clients  shall  retain  the  benefits  of  VR  services. 

i.  Percentage  26  closures  retaining  earnings  at  followup 

ii.  Comparison  of  26  closures  with  public  assistance  as  primary  source  of  support  at  clo- 
sure and  followup 

iii.  Percentage  noncompetitively  employed  26  closures  retaining  closure  skills  at  followup 
(implement  after  FAI/LSI  pretest) 

8.  Satisfaction 

Clients  shall  be  satisfied  with  the  VR  program,  and  rehabilitated  clients  shall  appraise  VR 
services  as  useful  in  achieving  and  maintaining  their  vocational  objectives, 

i.  Percentage  closed  clients  satisfied  with  overall  VR  experience 

ii.  Percentage  closed  clients  satisfied  with:  counselor,  ph5rsical  restoration,  job  training 
services,  placement  services 

iii.  Percentage  26  closures  judging  services  received  as  useful  in  obtaining  their  job  home- 
maker sitimtion  or  in  current  performance 


Procedural  standards 

9.  R-300  validity 

Information  collected  on  clients  by  the  R-300  and  all  data  reporting  systems  used  by  RSA 
shall  be  valid,  reliable,  accurate,  and  complete. 

10.  Eligibility 

Eligibility  decisions  shall  be  based  on  accurate  and  sufficient  diagnostic  information,  and  VR 
shall  continually  review  and  evaluate  eligibility  decisions  to  ensure  that  decisions  are  being 
made  in  accordance  with  laws  and  regulations. 

11.  Timeliness 

VR  shall  ensure  that  eligibility  decisions  and  client  movement  through  the  VR  process  occur 
in  a timely  manner  appropriate  to  the  needs  and  capabilities  of  the  clients, 

12.  IWRP 

VR  shall  provide  an  Individualized  Written  Rehabilitation  Program  for  each  applicable 
client,  eind  VR  and  the  client  shall  be  accountable  to  each  other  for  complying  with  this 
agreement. 

13.  Goal  planning 

Counselors  shall  make  an  effort  to  set  realistic  goals  for  clients.  Comprehensive  consider- 
ation must  be  given  to  all  factors  in  developing  appropriate  vocational  goals  such  that  there 
is  a maximum  of  correspondence  between  goals  and  outcomes:  competitive  goals  should  have 
competitive  outcomes  and  noncompetitive  goals  should  have  noncompetitive  outcomes. 


97 


6.  To  ensvire  that  clients  closed  as  reha- 
bilitated retain  the  benefits  obtained 
from  the  rehabilitation  process 

7.  To  ensure  that  the  need  for  post- 
employment services  is  satisfied 

8.  To  ensure  that  agencies  are  consistently 
identifying  reasons  why  clients  are  not 
successfully  rehabilitated 

9.  To  ensure  that  the  client  is  satisfied 
with  the  vocational  rehabilitation  serv- 
ices as  developed  with  the  counselor 

For  each  of  these  standards,  which  focus 
heavily  on  compliance  with  the  spirit  and 
management  of  the  rehabilitation  process, 
data  elements  or  statistical  measures  drawn 
from  regularly  reported  client  and  program 
data  were  prepared.  For  example,  the  data 
elements  or  measures  defined  for  standard  2 
address  both  the  mezisure  of  gainful  employ- 
ment and  the  issue  of  suitability  for  the 
clients'  capabilities.  These  elements  were  the 
following: 

i.  Percentage  of  those  placed  in  com- 
petitive employment  (wage  and  salary 
earners  and  self-employment) 

ii.  Percentage  of  those  placed  in  non- 
competitive employment  (sheltered 
workshops  and  others) 

iii.  Percentage  of  those  placed  as  home- 
makers 

iv.  Percentage  of  those  placed  as  unpaid 
family  workers 

V.  Percentage  of  those  placed  in  business 
enterprise  programs 

vi.  Those  who  received  training  related  to 
the  job  family  in  which  they  were 
placed  (as  identified  by  the  first  digit 
of  the  Dictionary  of  Occupational 
Titles  code)  as  a percentage  of  the 
total  number  who  recieved  training 

vii.  Mean  weekly  earnings  in  the  week 
before  referral  including  clients  with 
zero  earnings 

viii.  Mean  weekly  earnings  at  closure  of  all 
rehabilitated  clients,  including  clients 
with  zero  earnings 


Data  elements  i to  v of  this  standard  reflect 
the  relative  frequencies  of  various  types  of 
status  26  closure.  Date  element  vi  attempts  to 
measure  suitability  by  indicating  whether  the 
placement  is  related  to  the  training  program; 
comparison  of  elements  vii  and  viii  indicates 
wage  gain  of  clients  in  the  program  as  a 
whole.  The  standard  thus  included  descriptive 
measures  as  well  as  mezisures  of  change  or 
appropriateness  of  service. 

The  norm  for  performance  on  most  elements 
was  set  as  plus  or  minris  one  standard  devia- 
tion from  the  mean  performance  of  all  State 
VR  agencies  (depending  on  whether  a minimum 
or  maximum  value  for  the  element  was  de- 
sirable). Such  zm  approach  meant  that  for  data 
elements  with  a normal  or  near  normal  dis- 
tribution, about  16  percent  of  the  States  would 
be  out  of  compliance  by  definition.  The 
standards  were  promulgated  through  publi- 
cation in  the  Federal  Register  (Vol.  39,  No. 
128,  July  2,  1974),  with  performance  levels 
bzised  on  the  past  year’s  performance.  States 
were  required  to  submit  data  to  RSA,  which 
let  a private-sector  contract  for  analysis  of 
State  performance  levels  (JWK  International 
Corporation  1977).  States  experienced  a va- 
riety of  hardships  in  preparing  the  required 
data  for  the  standards.  While  some  States 
were  able  to  handle  these  requirements 
through  automated  processes,  others  had  to  do 
hand  calculations.  Some  States  had  their  VR 
operations  within  social  service  umbrella 
agencies  and  had  difficulty  obtaining  data 
processing  time  or  software  support  to  respond 
to  the  new  requirements.  States  varied  in  staff 
evaluation  training  and  skills,  and  promulga- 
tion of  the  standards  called  for  State  agencies 
to  use  new  methods  and  to  develop  evaluation 
skills.  Specifically,  new  needs  included  the 
followixig: 

• Capabilities  for  estimating  the  size  and 
characteristics  of  the  eligible  disabled 
population 

• State  capacity  for  evaluating  the  effec- 
tiveness of  the  State  agency  service 
process 

• Capabilities  for  determining  the  extent  to 
which  rehabilitation  agencies  achieve 
their  objectives  (Rubin  1975) 

This  call  for  program  evaluator  capacity 
found  States  faced  with  the  challenge  of  de- 
veloping methodologically  sound  program 
evaluation  strategies  for  meeting  the  annual 


98 


audit  demands  created  by  the  new  legislation 
(Rubin  1975).  To  help  States  respond  to  these 
new  demands,  RSA  awarded  a grant  to  the 
University  of  Arkanszis  Rehabilitation  Re- 
search and  Training  Center.  The  Arkanszis 
project  involved  three  phases:  a national  pro- 
gram evaluation  conference  in  Memphis  in 
October  1974;  a study  pheise,  during  which 
teams  in  all  10  RSA  regions  examined  avail- 
able data  and  evaluation  methods  required  by 
the  published  standards;  and  finally  a meeting 
in  New  Orleans  in  April  1975  when  State 
agency  program  evaluators  reviewed  the 
findings  of  the  regional  study  teams. 

These  regional  studies  focused  on  con- 
ceptual and  measurement  problems  in  the 
standards  and  examined  alternatives  to  the 
determination  of  such  program  concepts  as 
estimating  the  State  disabled  population,  de- 
termining manageable  caseload  size,  and 
measuring  client  satisfaction. 

States  themselves,  in  responding  to  the  first 
announced  standards,  identified  definitional 
issues  and  gaps  in  existing  program  data  and 
questioned  the  selection  of  the  standards  and 
their  meeisures.  As  States  gained  experience 
on  these  standards,  definitional  issues  were 
clzirified  zind  report  formats  developed  and 
refined. 

This  first  set  of  standards  met  the  re- 
quirements for  reporting  set  forth  in  the  Act, 
began  the  reporting  process,  and  introduced 
the  idea  of  evaluation  or  performance  stand- 
ards, but  it  did  not  meet  the  conceptual  test 
of  expressing  the  goals  and  decision  points  of 
the  program  to  the  satisfaction  of  most  of  the 
individuals  involved.  In  consequence,  a devel- 
opmental activity  was  supported  to  learn  from 
experience  with  the  first  standards  in  order  to 
develop  a better  system. 


The  New  Measures:  Round  Two 

In  1975,  RSA  contracted  with  The  Urban 
Institute  to  use  a more  analytical  approach  to 
refining  the  standards.  The  Institute  had  pro- 
posed the  development  of  a simulation  model 
of  the  rehabilitation  system  and  the  ultimate 
setting  of  standards  performance  levels  based 
on  analysis  using  the  model. 

To  build  such  a model,  a number  of  problems 
had  to  be  solved  emalytically.  Of  the  many  VR 
process,  project,  and  program  activities.  The 
Urban  Institute  finally  chose  to  focus  on  eight 
dimensions  of  client  flow  through  the  program: 
outreach,  referral,  client  mix,  service  utili- 
zation, facilities  utilization,  timeliness,  sim- 


ilar benefits  (services  obtained  from  other 
programs),  zind  outcome  mix. 

The  Institute  staff  examined  each  of  these 
topic  areas  in  a series  of  issue  papers,  sug- 
gesting further  analysis  mostly  in  the  form  of 
multivariate  regression  on  specific  system 
questions.  For  example,  for  referral:  States 
can  develop  referral  guidelines  using  regres- 
sion analysis  to  isolate  the  correlates  of  suc- 
cessful completion. 

The  Institute  also  recommended  some 
changes  in  the  announced  1974/75  standards 
and  suggested  developing  new  standards  from 
the  topics  explored  in  the  issue  papers.  The 
Institute's  recommendations  typically  called 
for  procedures  to  insure  adequate  performance 
(Turem  et  al.  1976).  In  its  final  report,  the 
Institute  criticized  the  existing  standards 
s3Tstem  and  recommended  development  of 
sophisticated  statistical  techniques  for  com- 
parison of  State  programs  lejiding  to  a com- 
prehensive microsimulation  or  overall  evalu- 
ation framework  (The  Urban  Institute  1976). 


The  New  Measures:  Round  Three 

In  the  fall  of  1976,  RSA  contracted  with 
Berkeley  Planning  Associates  (BP A)  to  further 
develop  and  refine  the  standards,  drawing 
from  previous  efforts  but  using  a new  con- 
ceptual approach. 

By  this  time.  States  had  prepjired  one  sub- 
mittal on  the  RSA  standards  zmd  were  pre- 
paring another.  BPA  began  its  developmental 
effort  by  discussing  the  standards  with  ad- 
ministrators, researchers,  and  data  personnel 
in  several  State  rehabilitation  agencies  to  de- 
termine the  relevance  and  usefulness  of  the 
current  standards  for  State  operation,  to 
identity  implementation  issues,  amd  to  learn 
State  recommendations  on  revisions.  In  these 
interviews,  it  became  apparent  that  State  ad- 
ministrators had  no  clear  sense  of  what  RSA 
intended  to  do  with  the  data  submitted  on  the 
standards  and  thus  had  not  empheisized  the 
standards  as  a planning  or  managing  tool. 
State  agency  personnel  expressed  frustration 
with  incomplete  instructions  on  methods  and 
definitions  to  be  used  zind  complained  of  lack 
of  substantive  feedback  from  RSA  on  their 
first-year  submittals.  In  short,  the  standards 
were  perceived  as  one  more  burdensome  Fed- 
eral reporting  requirement  rather  than  2is 
useful  for  State  agency  memagement. 

The  short  time  frame  for  posting  the 
standards  hzid  not  allowed  for  RSA  develop- 
ment of  the  reporting  systems  or  technical 


99 


assistance  capacity  necessary  for  smooth 
State  implementation.  State  submittals  were 
sent  to  Waishington;  the  central  office,  how- 
ever, did  not  have  the  staff  to  immediately 
process  submittals  or  even  to  review  the  State 
documents  and  ans^ver  substantive  questions 
on  their  contents.  Eventually,  this  problem 
was  alleviated  through  contracts  for  statis- 
tical analysis  of  the  submittals  and  prepara- 
tion of  reports  on  State  performance.  The 
development  of  the  report  formats  themselves 
took  time,  and  only  gradually  did  the  States 
receive  the  standards  information,  sometimes 
years  after  the  period  in  question  (JWK  1977). 
Obviously,  this  delay  limited  the  utility  of  the 
information  for  managers. 

The  size  of  the  reports  also  limited  the 
usefulness  and  the  distribution  of  the  stand- 
ards, with  one  complete  set  taking  up  several 
shelves.  Although  the  reports  are  thorough  in 
their  presentation  of  comparative  and  his- 
torical information  for  each  data  element  for 
each  State,  they  did  not  lend  themselves  to 
wide  distribution  or  to  quick  scans  by  admin- 
istrators (Rehabilitation  Research  Institute 
1981). 


Identifying  Program  Accomplishment 

The  States'  reaction  to  the  1974/75  stand- 
ards and  the  clear  indication  of  lack  of  use 
pointed  to  the  need  for  stepping  back  and  re- 
assessing the  content  and  purpose  of  a per- 
formance standards  system  for  rehabilitation. 
One  general  criticism  of  both  the  published 
standards  and  The  Urban  Institute  work  was 
that  meeisures  of  processes,  of  compliance 
issues,  and  of  program  impacts  were  mixed 
together  without  an  underl5dng  conceptual 
framework  (BPA  1977a,  p.  158).  BPA’s  new 


design  effort  began  with  an  examination  of 
alternative  conceptual  approaches  to  the  de- 
velopment of  standards.  A review  of  stamd- 
ards-setting  in  other  social  service  fields  (BPA 
1977a)  showed  a variety  of  approaches  from  a 
focus  on  inputs  (as  either  structural  or  gate- 
keeping  eligibility  standards)  to  processes 
(meaisures  of  best  practice)  to  outcomes  (or 
program  impacts).  After  analysis  of  the 
strengths  and  weaknesses  of  these  alternative 
designs,  BPA  (1977b)  recommended  an  ap- 
proach emphasizing  program  outcome  and 
identified  three  levels  of  program  questions 
and  corresponding  evaluation  activities,  as 
shown  in  table  2. 

BPA  postulated  that  compliance  concerns 
were  best  handled  throu^  audits  and  other 
review  procedures,  and  that  performance 
evaluation  should  focus  on  the  second-  and 
third- level  questions  of  relationship  to  and 
measurement  of  goal  achievement.  Many 
compliance  questions  relate  specifically  to 
whether  the  regulations  are  being  followed 
rather  than  to  program  effects.  Concerns  at 
this  level  detract  from  a focus  on  outcome  by 
concentrating  on  processes.  BPA  thus  argued 
that  standards  and  their  data  elements  should 
be  developed  only  for  dimensions  that  directly 
measure  progra.-n  outcome.  Other  evaluation, 
monitoring,  and  research  activity  supportive 
of  the  standards  could  be  carried  out  to  in- 
vestigate areas  of  problematic  performance. 

To  identify  the  dimensions  that  should  be 
treated  in  the  performance  standards,  BPA 
analyzed  the  rehabilitation  regiilations  and 
administrative  manuals  and  developed  a de- 
tailed model  of  the  client  flow  process,  iden- 
tifying over  70  decision  points.  For  each  point 
in  the  detailed  model,  program  problems  and 
decisions  were  listed.  These  problem  and  de- 
cision lists  in  turn  led  to  the  development  of 


Table  2.  Levels  of  progrzim  evaluation  activity 


Question 


Evaluation  activity 


1.  Is  the  activity  in  compliance  with  the 
regulations  (whether  or  not  it  affects 
outcome)? 

2.  Does  the  activity  contribute  to  the 
achievement  of  program  goals? 

3.  Does  the  measure  directly  reflect 
program  effectiveness,  coverage,  or 
impact  in  meeting  stated  objectives? 


Compliance  audits,  case  reviews 

Program  monitoring,  evalimtion,  and 
research  on  effectiveness  of  processes 

Standards  for  program  performamce 


100 


Table  3.  Candidate  area  identification  for  a VR  decision  point  (example) 


Decision  point  Client  provided  services  according  to  IWRP  (written  plan) 


Problem(s):  Clients  may  not  receive  all  the  services  specified  in  the  IWRP  in  the  amounts 

and  in  the  time  schedule  specified,  thereby  rendering  the  service  plan  less 
effective  and  breaking  the  agency's  agreement. 

Decision(s):  Require  agencies  to  investigate  cases  where  movement  through  statuses  ex- 

ceeds time  norms  to  determine  whether  there  is  a problem  needing  redress. 

Candidate  area(s):  Agencies  shall  ensure  that  clients  move  through  the  rehabilitation  process  in  a 
timely  and  coordinated  manner. 


Source:  BPA  1977c,  appendix  1,  p.l6 


an  initial  set  of  candidate  arezis  for  standards. 
An  example  of  this  analysis  is  shown  in  table 
3.  Through  this  process,  71  candidate  arezis  for 
standards  were  identified  by  BPA,  RSA,  zmd  an 
advisory  committee  representing  CSAVR. 

Although  the  process  of  moving  from  anal- 
ysis of  the  client  flow  to  the  identification  of 
candidate  areas  provided  the  opportunity  to  be 
comprehensive,  the  resulting  plethora  of  pos- 
sible meeisures  would  be  likely  to  obscure 
rather  than  clarify  the  measurement  of  per- 
formance. For  zidministrators  to  become  day 
users  of  the  standards  information,  this  set 
needed  to  be  reduced  to  the  essential  per- 
formance meeisures  for  the  program.  There- 
fore, each  candidate  area  v/as  analyzed  using 
the  criteria  listed  in  table  4.  Analysts  explored 
each  mezisure's  strengths  and  weaknesses  on 
each  criterion,  documented  related  program 
rese8U*ch,  and  presented  recommendations  for 
the  role  of  the  candidate  area  in  the  standards 
system.  Project  staff,  RSA,  and  the  zidvisory 


committee  discussed  whether  each  of  the  71 
candidate  areas  should  be  regarded  ets  per- 
formance standards,  procedural  standards,  or 
supportive  evaluation: 

Performance  standards  mesisure  the 
achievement  of  a desired  outcome  or  mission 
of  the  program  (e.g.,  competitive  employment 
closures). 

Procedural  standards  address  protection  of 
client  interests  by  ensuring  key  processes  but 
do  not  mezLsiore  ultimate  program  performance 
(e.g.,  the  timeliness  concerns  in  table  2). 

Supportive  evaluation  elements  were  de- 
fined as  aspects  of  the  VR  process  useful  in 
the  analysis  of  performance  to  explain  dif- 
ferences and  to  help  identify  program  actions 
enhancing  performzmce  on  standards.  These 
are  the  independent  variables  in  causal  models 
of  the  program,  where  performance  outcome 
(0)  is  seen  as  dependent  on  a number  of  en- 
vironmental (E),  client  (C),  and  process  (P) 
chzu’acteristics.  0=f  (E,C,P). 


Table  4.  Criteria  for  standards  development 


Criteria  for  conceptual  soundness 
Appropriateness 
Validity 

Uniqueness  (nonredundancy) 

Completeness 

Internal  consistency 

Comprehensiveness 

Policy  consistency 

Flexibility 

Compliance  adequacy 
Explicitness  of  assumptions 

Criteria  for  measures  of  concept 
Methodological  utility 
Data  quality 


Criteria  for  management  utility 
Efficiency 
Evaluative  utility 
Controllability 
Cost-effectiveness 

Criteria  for  ease  of  implementation 
Clarity 
Cost 

Controllability 
Face  validity 
Capability 


101 


Analjrsis  of  the  71  areas  resulted  in  11  can- 
didates for  performance  standards,  7 for  pro- 
cedural stzindards,  and  17  for  supportive 
evaluation.  The  remaining  36  areas  were 
dropped  from  further  consideration  as  having 
no  direct  relationship  to  program  mission.  A 
further  anal3^is,  refinement,  and  reduction 
followed  to  achieve  a balance,  so  that  indi- 
vidual standards  would  complement  one 
another.  This  W2is  necessary  because  some 
goals  of  the  program  are  in  tension.  Obtaining 
high  numbers  of  successful  closures  is  desir- 
able but  should  be  done  with  concern  for  out- 
come quality  and  cost.  In  fact,  the  three  major 
zispects  of  progress  performance-  coverage, 
efficiency,  and  impact— can  be  regarded  as  the 
key  tradeoffs  in  the  management  of  the 
service  program  (figure  1).  This  further  re- 
finement reduced  the  number  of  performance 
standards  to  8,  procedural  standards  to  4,  and 
supportive  elements  to  14.  Standards  were 
developed  for  coverage  of  target  population, 
cost-effectiveness,  rehabilitation  rate,  in- 
crecised  economic  independence,  competitive 
employment,  vocational  gain  attributable  to 
VR  service,  benefit  retention,  and  client  sat- 
isfaction. For  each  of  the  8 standards,  specific 
measiires  were  developed. 


Measuring  Performance 

Expressing  a program  value  for  clients  such 
as  "increased  economic  independence"  or  "use 
of  resources  in  a cost-effective  manner"  is 


very  different  from  specifying  measures  of 
such  concepts.  The  measures  rather  than  the 
concept  of  standards  or  the  standards  them- 
selves had  met  with  criticism  in  the  New 
Orleans  regional  reports,  in  The  Urban  Insti- 
tute’s work,  and  in  most  State  critiques  of  the 
standards.  The  next  design  task  for  BPA  was 
to  recommend  appropriate  measures  for  each 
of  the  eight  performance  standards. 

To  identify  the  most  appropriate  data  ele- 
ments for  the  standards,  BPA  first  reviewed 
the  availability  of  data  at  the  State  and  Fed- 
eral levels.  To  pretest  alternative  measures, 
BPA  analyzed  State  data  tapes  and  other 
relevant  sources  to  determine  which  of  the 
possible  measures  best  expressed  the  intent  of 
the  standard,  which  were  most  readily  con- 
structed from  existing  data  systems,  and 
which  would  be  of  most  use  to  program  eval- 
uators and  administrators. 

An  example  of  measurement  problems  and 
how  they  were  resolved  is  the  difficulty  in 
estimating  those  "potentially  eligible"  for 
service  as  specified  in  the  first  standard  on 
coverage  (see  table  1). 

No  regularly  collected  population  survey 
yields  counts  of  individuals  with  both  a hand- 
icap and  vocational  potential.  Nor  is  it  pos- 
sible to  derive  these  estimates  through  cross- 
tabulation or  other  manipulation  of  existing 
surveys  (Ridge  19726;  Worrall  1974;  Worrall 
and  Schoon  1975). 

In  spite  of  the  lack  of  a precise  measure  for 
the  target  population,  coverage  was  regarded 
3is  an  important  aspect  of  performance  in  a 


Figure  1 . Program  tradeoffs 


Coverage  (How  many  served?) 


102 


quality  program  by  State  participants  in  the 
developmental  process.  Therefore,  in  the  ab- 
sence of  a precise  estimate,  two  coverage 
proxies  were  identified  as  measures  of  per- 
formance on  this  standard: 

i.  Percentage  of  national  caseload  served/ 
percentage  of  national  VR  budget 

ii.  Caseload  served/ 100, 000  State  popula- 
tion (total) 

Both  of  these  measures  could  be  constructed 
from  generally  available  data,  but  alone  each 
is  vulnerable  in  that  neither  expresses  the 
standard.  Each  does  have  face  validity  for 
administrators,  however.  The  two  together 
were  seen  as  describing  coverage  effort  and 
acknowledging  agency  resources.  Moreover,  if 
usable  estimates  of  the  target  population  be- 
came available  later,  the  data  elements  or 
measures  for  the  standard  could  be  respeci- 
fied. So  long  as  the  program  mission  and 
values  remained  the  same,  the  standards  would 
remain.  Changes  in  program  knowledge,  data 
availability,  or  experience  with  use  of  the 
standards,  however,  might  result  in  changes  in 
the  data  elements,  or  even  additions  to  the 
standards  themselves,  as  measurement  prob- 
lems were  solved. 

Anal5Tsis  and  debate  followed  in  this  fashion 
for  the  data  elements  of  each  of  the  other 
seven  standards.  Balance  between  elements, 
both  within  and  across  standards,  remained  an 
important  design  consideration.  These  stand- 
ards and  data  elements  were  sent  to  State 
agencies  and  other  rehabilitation  researchers 
and  evaluators  by  RSA  in  1978  with  a call  for 
comments  and  review.  The  next  step  would  be 
a test  of  reporting  systems  and  use. 


Pretesting  Standards  and  Their  Data  Elements 

RSA  had  learned  a lesson  in  the  first  round 
of  standards  development.  The  imposition  of 
required  standards  and  their  data  elements 
without  a trial  period  had  resulted  in  valid 
criticism.  Rather  than  require  a wholesale 
change  over  to  the  new  system,  RSA  sponsored 
a pretest  of  the  new  standards  system  as  one 
part  of  a demonstration  project  to  stimulate 
program  evaluation  activity  in  State  reha- 
bilitation agencies.  Model  evaluation  units 
(MEUs)  in  six  State  VR  agencies,  varying  con- 
siderably in  caseload  size  and  in  the  sophis- 
tication of  their  data  processing  s3^tems  and 


procedures,  performed  a number  of  tasks  as 
part  of  their  contracts,  including  collecting 
information  for  the  new  standards. 

Federal  Government  procurement  and  in- 
formation collection  approval  processes  im- 
peded the  pretest  of  the  standards  to  some 
extent.  The  contract  for  the  pretest  itself  was 
not  awarded  until  1979,  when  the  MEUs  hzid 
been  operating  for  a year.  Further  delay  oc- 
curred while  the  information  requirements  for 
additional  data  items  required  by  the  stand- 
ards were  reviewed  and  finally  approved  by 
the  Office  of  Management  and  Budget  (0MB). 
Collection  of  State  data  for  the  pretest  was 
completed  in  1980,  however,  with  the  six 
participating  States  each  providing  client  data 
needed  for  the  new  data  elements  and  for  the 
procedural  standards  2is  well.  Performance  for 
each  of  the  six  States,  bzised  on  the  new  ele- 
ments, was  calculated  as  a part  of  the  retest. 
Data  reliability  and  validity,  2is  well  as  the 
MEU's  analysis  of  the  usefulness  of  the  data, 
were  also  reported  (BPA  1981).  State  MEUs 
commented  oti  their  experience  with  the 
standards,  including  the  time  taken  to  install 
the  system  in  their  agencies.  For  the  most 
part,  the  standards  received  favorable  com- 
ments on  their  technical  merits,  as  well  zis 
suggestions  regarding  refinement  or  more 
specific  definition  of  data  elements.  BPA  then 
used  the  experience  of  the  MEUs  and  other 
reviewers  to  further  clarify  the  data  elements 
and  to  recommend  a final  set  of  standards  for 
implementation.  Table  1 (p.  96)  shows  the  final 
set  of  recommended  standards  (BPA  1983). 

Helpful  as  the  pretest  was,  it  fell  short  of 
adequately  testing  the  utility  of  the  standards 
as  a management  tool.  No  MEU  developed  on 
its  own  reporting  systems  or  special  formats 
related  to  the  elements  in  the  standards.  In 
fact,  throughout  the  pretest  these  MEUs  re- 
garded the  standards  as  an  external  require- 
ment. For  their  own  management,  MEU  staff 
were  more  interested  in  sjrstems  they  had  de- 
veloped for  themselves  (e.g.,  a caselozui 
analysis  model).  The  standards  alone,  without 
a framework  for  their  use,  did  not  appear 
likely  to  be  spontaneously  adopted  by  the 
State  agencies  they  were  supposed  to  guide. 


Developing  VR  Decision  TREES 

The  final  component  of  the  standards  sys- 
tem was  developed  in  response  to  the  problem 
of  specifsnng  uses  for  the  system.  Even  before 
the  RSA’s  standards  system.  States  had  ex- 
tensive information  and  a sophisticated  client 


103 


data  base.  However,  there  are  not  many 
pragmatic  models  for  using  State  data  in  man- 
agement . decisionmaking.  The  MEU  pretest 
w<is  intended  eis  a test  of  the  technical 
strengths  of  the  selected  data  elements  and 
the  feasibility  of  information  collection.  It  did 
not  touch  on  information  use,  particularly  by 
top  management.  So,  BP  A analysts  sought  zm 
approach  that  would  link  standards  data  with 
management  decisionmaking.  The  result  of 
this  work  was  the  development  of  The  Reha- 
bilitation Executive's  Evaluation  System 
(TREES),  which  provides  guidelines  to  evalu- 
ators and  statistical  staff  in  the  analysis  of 
program  data.  TREES  is  a data- based  decision 
support  model,  which  identified  low-cost 
problem-flagging  methods  and  suggests  areas 
for  agency  action.  Unlike  the  "supportive 
evaluation"  approach  taken  earlier  in  the 
standards  development,  TREES  la5rs  out,  step 
by  step,  an  information  analysis  procedure  to 
be  followed  if  performance  on  any  one  data 
element  in  the  standards  system  falls  below 
agency  expectation.  Based  on  a logical  de- 
cision tree  system,  TREES  provides  a blueprint 
for  agency  evaluation  and  self-diagnosis 
(Stoddard  et  al.  1983). 

A State's  performance  on  the  data  ele- 
^ ments,  BPA  rezisoned,  should  be  compared  to 
performance  levels  set  for  that  period.  In  any 
given  year,  some  agencies  will  not  have  met 
some  of  their  objectives  set  for  level  of  at- 
tainment on  the  standards.  The  TREES  system 
does  not  stop  with  this  comparison  or  grzwiing 
but  instead  moves  to  investigate  the  prob- 
lematic attainment  and  to  identify  corrective 
actions.  The  purpose  of  this  decision  support 
system  is  to  close  the  gap  between  reporting 
on  the  standards  and  actions  based  on  the 
standards.  The  system: 

• provides  an  ability  to  pinpoint  causes  for 
problems  in  attainment 

• identifies  strategies  leziding  to  enhanced 
attainment 

• identifies  appropriate  policy  recom- 
mendations and  program  actions  that  can 
be  taken  by  State  agencies,  RSA,  or  Con- 
gress, bzised  on  the  analysis  and  aimed  at 
improvement  in  agency  attainment 

Achievement  of  these  objectives  requires 
synthesis  of  first-hand  familiarity  with  pro- 
gram operations,  anal3d:ic  techniques,  and  a 
sensitivity  to  policy  concerns.  Sensitivity  to 
policy  concerns  is  a most  important  consid- 


eration in  terms  of  the  overall  design  of  the 
supportive  evaluation  system.  Decisions  are 
maide  by  program  managers,  be  they  within 
RSA  or  within  State  agencies.  The  TREES 
component  of  the  standards  evaluation  system 
was  designed  to  inform  decisions  aimed  at 
alleviating  observed  problems  in  agency  at- 
tainment, zis  measured  by  the  data  elements. 


RSA  Efforts  to  Introduce  the  Standards 

During  the  developmental  process  just  de- 
scribed, there  had  been  some  change  in  Fed- 
eral policy  with  respect  to  State  VR  data  in 
general,  and  to  evaluation  standards  in  par- 
ticular. Significantly,  0MB  disapproved  the 
R-300  requirement,  leaving  the  VR  system 
without  its  uniform  data  base.  The  CSAVR, 
taking  the  position  that  the  lack  of  the  uni- 
form national  data  base  would  weaken  both 
the  Federal  program  and  the  program  of  in- 
dividual States,  urged  its  members  to  continue 
collecting  the  client  data,  regardless  of  Fed- 
eral requirements  to  do  so.  CSAVR  proposed  a 
revised  data  document  that  includ^  most  of 
the  data  elements  in  the  previous  R-300  form. 
By  1984,  Federal  policy  wzis  shifting  again;  the 
"R-911"  designed  to  inclxide  the  CSAVR  com- 
ments and  to  become  the  baisis  for  the  new 
Federal  reporting  requirements,  replaced  the 
R-300. 

In  an  environment  where  Federal  policy  and 
State  role  was  shifting,  RSA  officials  took  the 
position  that  rather  than  require  new  evalua- 
tion standards  and  measures,  the  Federal  of- 
fice would  "encourage  the  use"  of  the  recom- 
mended BPA  standards  in  State  agencies.  RSA 
distributed  copies  of  the  technical  reports 
explaining  the  standards  and  the  construction 
of  the  data  elements,  and  RSA's  Commissioner 
wrote  to  the  States: 

The  present  sjrstem  is  designed  to  accom- 
modate the  management  needs  of  State 
agencies.  I believe  its  zidoption  by  you  will 
be  useful  in  setting  objectives  applicable 
to  your  State,  provide  you  with  a firm 
base  for  monitoring  zmd  program  analysis 
and  evaluation  that  is  feasible.  . . . We 
believe  this  Program  Standards  Evaluation 
System  will  help  the  States  understand 
why  their  programs  produce  certain  out- 
comes. It  will  also  help  you  to  identify 
possible  problems  within  your  State. 

. . .(Conn  1983a) 

In  1983,  an  RSA  officer  was  placed  in 


104 


charge  of  the  implementation  of  the  standards 
system.  RSA’s  regional  offices  were  eissigned 
the  responsibility  of  helping  States  develop  an 
implementation  strategy.  The  RSA  central 
office  surveyed  States  to  determine  how  many 
States  were  already  using  the  standards  and 
how  many  were  considering  adopting  some  or 
all  of  the  meeisures. 

An  RSA  work  group  composed  of  State  and 
Federal  officials  was  charged  with: 

1.  developing  of  a strategy  to  help  States 
wishing  to  incorporate  the  new  evalua- 
tion sjfstem  into  their  operations 

2.  determining  the  best  ways  to  facilitate 
States'  use  of  the  system 

3.  determining  how  RSA  resources  could  be 
used  to  facilitate  States'  use  of  the 
system  (Conn  19836) 

In  1984,  RSA  supported  a training  contract 
aimed  at  providing  States  with  an  introduction 
to  the  standards  sjrstem.  Several  RSA  regional 
offices  sponsored  conferences  to  introduce 
States  to  the  standards  measurement  system, 
as  well  as  other  RSA  sponsored  management 
tools. 


The  States  Respond 

During  much  of  the  development  of  the 
standards  ssrstem,  from  1974  to  1984,  State 
interest  in  the  sj^tem  was  limited.  Some  State 
evaluation  specialists  participated  in  the 
CSAVR  committees  reviewing  alternative 
measurement  approaches,  but  for  the  most 
part,  the  standards  received  only  cursory  in- 
terest. In  late  1983,  however,  the  situation 
changed.  House  and  Senate  versions  of  the 
amendments  to  the  Rehabilitation  Act  men- 
tioned the  need  for  evaluation  standards  for 
the  VR  program.  Public  Law  98-221,  the  1984 
Amendments,  required  national  standards.  The 
Evaluation  Committee  of  CSAVR  assumed 
responsibility  within  that  organization  for  re- 
view of  the  State  response  to  the  proposed 
standards.  In  an  extensive  analysis  of  the 
system  conducted  in  the  latter  part  of  1983 
and  the  early  months  of  1984,  the  committee 
examined  each  standard  and  data  element  in 
the  recommended  set  and  submitted  their  own 
recommendations,  suggesting  the  elimination 
of  some  of  the  data  elements  and  the  reword- 
ing of  some  of  the  standards.  The  overall 
framework  for  the  mezisurement  system  and 


most  of  the  data  elements  remained  the  same 
(CSAVR  1984).  The  CSAVR  report  supports  the 
concept  of  performance  meeisurement  by  em- 
phjisizing  that: 

1.  Good  management  requires  a formal, 
ongoing  process  in  which  programmatic 
standards  are  set  and  the  performance  of 
the  agency  is  then  measured  against 
these  standards  in  an  attempt  to  answer 
the  question:  "How  well  are  we  doing?" 

2.  Any  set  of  standards  adopted  by  a State 
agency  must  reflect  accurately  the 
programmatic  philosophy  and  policies  of 
that  agency  and  use  data  elements  that 
are  reliable  and  valid  for  measuring 
performance. 

3.  Good  evaluation  does  not  come  cheaply. 
Once  the  standards  have  been  adopted 
and  the  data  elements  defined,  adequate 
resources  must  be  available  to  imple- 
ment the  evaluation  process,  or,  again, 
the  entire  effort  at  better  management 
goes  unrewarded  (CSAVR  1984). 

States  have  gradually  begun  to  incorporate 
components  of  the  performance  standards  into 
their  own  internal  management  systems.  In 
Hawaii,  for  instance,  the  central  VR  office  is 
using  data  elements  from  the  performance 
standards  to  compare  performance  in  their 
four  Districts  and  to  use  the  TREES  S5^tem  to 
plan  for  State  achievement  of  goals.  In  Colo- 
rado, supervisory  staff  from  the  field  offices 
have  been  meeting  with  central  office  man- 
agement to  identify  key  measures  of  per- 
formance in  their  program  and  to  design  State 
management  reports  that  track  individual  and 
office  performance  on  the  standards.  In  Wis- 
consin, elements  of  the  system  are  incorpo- 
rated into  the  State's  internal  resource  allo- 
cation decision  process. 


Conclusion 

The  rehabilitation  system  has  traditionally 
benefited  from  its  ability  to  describe  its 
achievement  in  terms  of  a single  outcome 
mezisure,  the  status  26.  However,  this  mezisure 
neither  differentiates  between  outcomes  of 
different  quality  nor  allows  for  difficult 
clients.  Consequently,  in  response  to  a con- 
gressional directive  to  develop  performance 
measxires,  RSA  sponsored  development  of  a 
system  of  standards  to  measure  performance 


105 


in  several  program  dimensions.  One  standards 
system  was  announced,  zmd  subsequent  de- 
velopmental efforts  resulted  in  a new  system 
being  implemented  in  part  in  several  States. 
This  developmental  effort  has  taken  several 
years  and  involved  State  administrators,  RSA 
officials,  and  other  rehabilitation  experts  in  a 
careful  examination  of  the  processes  of  reha- 
bilitation and  the  goals  of  the  program.  The 
resulting  meeisures  stress  competitive  em- 
ployment as  a quality  outcome,  and  for  the 
most  part  relegate  process  aind  input  concerns 
to  case  review  and  other  compliance 
procedures. 

Recommendations  focus  on  the  State  as  a 
key  user  of  the  ssrstem  and  stress  the  use  of 
prospective  performance  goals  rather  than 
retrospective  performance  norms.  The  real 
test  of  success  for  this  system  will  be  its 
actual  use  by  State  agencies  and  in  the  tai- 
loring of  the  system  to  State  needs.  To  help 
State  evaluators  and  managers  in  using  the 
system,  BP  A prepared  an  evaluation  decision 
model,  TREES,  showing  the  use  of  the  per- 
formance meeisures  and  other  program  data 
for  diagnosis  of  program  problems.  The  zidop- 
tion  of  a set  of  performance  me2tsures  is  not 
an  automatic  process,  as  RSA's  experience 
with  the  current  standards  hzis  shown. 

This  system  design  began  when  RSA  wanted 
a Federal  reporting  system  and  a central  Man- 
agement Information  System.  It  was  com- 
pleted in  an  environment  of  decentralization, 
when  even  a uniform  Federal  reporting  system 
was  in  question.  The  next  few  years  will  test 
the  utility  of  the  system  for  management  of 
the  States'  VR  programs. 


References 

Abt  Associates.  Comprehensive  Management 
Information  System  for  the  State/Federal 
Vocational  Rehabilitation  Program.  Final 
Phase  I Report/Executive  Summary.  Cam- 
bridge, Mass.:  Abt,  1980. 

Abt  Associates.  Comprehensive  Management 
Information  System  for  the  State/Federal 
Vocational  Rehabilitation  Program,  MIS 
Final  Systems  Design.  Cambridge,  Mass.: 
Abt,  1981. 

Berkeley  Planning  Associates.  VR  Program 
Evaluation  Standards:  A Critique  of  the 
State  of  the  Art.  Berkeley,  Calif.:  BPA, 
1977a. 

Berkeley  Planning  Associates.  Alternative 
Conceptual  Approaches  to  Standards. 
(Working  Paper  Number  2)  Berkeley,  Calif.: 


BPA,  19776. 

Berkeley  Planning  Associates.  Specification  of 
Candidate  Areas  for  Standards,  and  the 
Conceptual  Approach  Underlying  the 
Standards  System.  (Working  Paper  Number 
3)  Berkeley,  Calif.:  BPA,  1977 c. 

Berkeley  Planning  Associates.  Program 
Standards  Evaluation  System  Final  Report 
Volume  I:  Report  on  the  Program  Standards 
Pretest.  Berkeley,  Calif.:  BPA,  1981. 

Berkeley  Planning  Associates.  Vocational 
Rehabilitation  Program  Standards  Evalua- 
tion System:  Executive  Summary.  Berkeley, 
Calif.:  BPA,  1983. 

Cole,  C.B.,  et  al.  Information  Heeds  Assess- 
ment (revised).  Cambridge,  Meiss.:  Abt  As- 
sociates, 1980. 

Conn,  G.A.  Letter  to  State  Directors,  May  12, 
1983.  United  States  Department  of  Educa- 
tion, 1983a. 

Conn,  G.A.  Letter  to  State  Directors,  June  7, 
1983.  United  States  Department  of  Educa- 
tion, 19836. 

Council  of  State  Administrators  for  Vocational 
Rehabilitation.  Final  Draft  Position  Paper 
on  Program  and  Project  Evaluation  Stew- 
ards. Alaska  Division  of  Vocational  Reha- 
bilitation, April  1984. 

Federal  Register.  Vol.  40,  No.  245,  December 
19,  1975.  1370.2  Definition. 

JWK  International  Corporation.  Validation  and 
Analysis  of  VR  Program  Evaluation  Stand- 
ards Information,  FY  1974  and  1975.  Ann- 
andale,  Va.:  the  Corporation,  1977. 

Rehabilitation  Research  Institute.  Analysis  of 
FY  1979  Data  on  the  Vocational  Rehabil- 
itation Standards.  Ann  Arbor,  Mich.:  Uni- 
versity of  Michigan,  School  of  Education, 
1981. 

Ridge,  S.S.  The  Allocation  of  Rehabilitation 
Fiords  Among  States  and  Districts:  An 
Evaluation.  Berkeley,  Calif.:  University  of 
California,  Institute  of  Urban  and  Regional 
Development,  1972a. 

Ridge,  S.S.  Estimating  the  Need  for  Rehabil- 
itation Services.  Berkeley,  Calif.:  University 
of  California,  Institute  of  Urban  and  Re- 
gional Development,  19726. 

Rubin,  S.,  ed.  Studies  in  the  Evaluation  of 
State  Vocational  Rehabilitation  Agency 
Programs.  Little  Rock,  Ark.:  University  of 
Arkansas,  Rehabilitation  Research  and 
Training  Center,  1975. 

Stoddard,  S.;  Rogers,  M.;  and  Langlois,  S.  The 
Rehabilitation  Executive's  Evaluation  Sys- 
tem (TREES).  Berkeley  Calif.:  BPA,  June 
1983. 

Turem,  J.S.;  Koshel,  J.;  D'Amico,  R.;  La 


106 


Rocca,  J.  Executive  Report  on  the  Per- 
formance  Standards  of  the  Vocational  Re- 
habilitation Program.  Wzishington,  D.C.:  The 
Urban  Institute,  1976. 

The  Urban  Institute.  Fmal  Report  on  Refine- 
ment and  Expansion  of  the  General  Stand- 
ards for  the  Evaluation  of  the  Performance 
of  the  Vocational  Rehabilitation  Program. 
Washington,  D.C.:  the  Institute,  1976. 

Worrall,  J.  "Some  Thoughts  on  the  Nature  and 
Size  of  the  VR  Target  Population."  New 


Brunswick,  N.J.:  Bureau  of  Economic  Re- 
search, Rutgers  University,  1974. 

Worrall,  J.,  and  Schoon,  C.  Methodologies  for 
the  estimation  of  the  VR  target  population: 
An  exploratory  analysis.  In:  Rubin,  S.,  ed. 
Studies  in  the  Evaluation  of  State  Voca- 
tional Rehabilitation  Agency  Programs. 
Little  Rock,  Ark.:  University  of  Arkanszis, 
Arkansas  Rehabilitation  Research  and 
Training  Center,  1975. 


107 


State  Mental  Health  Program  Performance  Measurement 
Selected  Impressions  From  Three  States* 

Wayne  A.  Kinimel 
Potomac,  Maryland 


Introduction 

This  paper  examines  performance  mezis\are- 
ment  and  monitoring  (PMM)  in  Pennsylvania, 
Tennessee,  and  Colorado  bzised  on  reports  and 
commentary  of  State  and  local  practitioners 
at  the  end  of  1982.  Each  of  the  three  States 
had  adopted  somewhat  different  PMM  ap- 
proaches and  were  at  different  developmental 
stages.  Brief  summaries  are  given  for  each  of 
the  three  systems,  the  longest  for  Pennsyl- 
vania which  is  the  most  elaborate  and  had 
been  operational  for  several  years,  and  the 
shortest  for  Tennessee  which  was  in  a pilot 
pheise.  General  observations  about  PMM  based 
on  all  three  cases  conclude  the  discussion. 

The  paper  is  not  definitive,  exhaustive,  or 
technical,  but  rather  exploratory,  selective, 
and  impressionistic.  Like  the  thinking  about 
and  practice  of  performance  measurement  and 
monitoring  in  mental  health  itself,  it  is  pro- 
visional and  open-ended.  The  presentation  is 
intended  to  stimulate  discussion,  not  end  it. 

Pennsylvania’s  Performance  Measurement 
System 

The  Pennsylvania  mental  health  program 
performance  indicators  and  allocation  process 
is  directed  by  the  Office  of  Community  Pro- 
grams of  the  Office  of  Mental  Health.  The 
State's  67  counties  are  combined  into  43 
county-level  programs.  At  that  level,  mental 
health  is  combined  administratively  with  men- 
tal retardation  and  a mixture  of  aging,  child 


•This  paper  is  a shortened  version  of  a contract 
report  to  the  Office  of  State  eind  Community 
Liaison,  National  Institute  of  Mental  Headth 
(Kimmel  1982). 


welfare,  and  optional  drug  and  alcohol  pro- 
grams. Except  for  3 to  4 small  counties  that 
provide  some  direct  services,  county  level 
units  purcheise  mental  health  services  through 
over  1,000  contracts.  By  and  large,  county 
offices  perform  suiministrative,  contract  man- 
agement, and  monitoring  functions.  The 
State's  allocation  process,  which  includes  use 
of  a set  of  performance  indicators,  covers 
county  contracted  mental  health  programs  and 
not  those  of  State  hospitals.  In  fiscal  ye2u: 
1982-83,  State  funds  for  local  mental  health 
services  totaled  about  $116  million.  Of  this 
amount,  about  $3,5  million  was  to  be  allocated 
through  the  use  of  performance  indicators. 
Annual  budget  increaises  or  decrezises  and 
funds  from  the  Federal  block  grant  are  all 
distributed  by  use  of  the  indicators  (allocation 
factors). 


The  Pennsylvania  Performance 
Indicators  Approach 

Pennsylvania  began  the  development  of 
performance  indicators  and  standards  in  1978. 
After  several  iterations  of  refinement,  a per- 
formance-based allocation  process  now  rests 
on  the  lose  of  local  program  generated  data  to 
produce  total  "z"  scores  (an  adopted  symbolic 
convention)  to  represent  the  overall  annual 
performance  of  each  county  program.  Most  of 
the  State's  annual  budget  increeise  over  the 
past  3 years  has  been  allocated  on  the  basis  of 
each  county's  relative  position  on  "z"  scores. 
The  performance  indicator  component  of  the 
allocation  process  has  been  roughly  in  its 
current  form  since  1980.  Though  it  continues 
to  undergo  refinements,  the  system  has  re- 
portedly reached  a point  where  further  major 
changes  would  require  basic  modifications  in 
local  and  statewide  data  systems.  Such  f\m- 


108 


damental  changes  appear  unlikely. 

The  Director  of  the  Office  of  Community 
Programs  reports  that  the  use  of  performance 
indicators  is  now  part  of  the  "mythology*' 
(accepted  practice)  of  the  State  S5rstem.  Most 
county  administrators  are  said  to  agree  that 
the  basic  indicator  criteria  are  reasonable. 
While  some  once  argued  heatedly  and  resisted 
politically  the  use  of  indicators  to  allocate 
some  State  funds,  they  reportedly  now  argue 
over  how  the  criteria  should  be  defined  and 
interpreted,  not  whether  there  should  be  per- 
formance criteria  in  the  first  place. 


Brief  History  of  the  Indicator- Based 
Allocation  I^ocess 

In  1977  when  a new  Director  of  the  Office 
of  Community  Programs  joined  the  staff,  the 
State  had  virtually  no  community  services 
development  and  monitoring  capability  (three 
staff),  no  computer  capability,  no  data  people, 
and  no  "organized  community  system."  (In 
1982  the  Office  had  43  positions.)  Power  was 
located  largely  at  the  county  level. 

Existing  State  data  and  reports  suggested  to 
State  staff  that  "there  was  no  equity  in  the 
(expenditure)  base."  A Pennsylvania  Mental 
Health  and  Mental  Retardation  Act  of  1966 
had  reqmred  that  funds  be  allocated  by 

a complex  formula  which  took  into  con- 
sideration the  population  of  the  county, 
the  number  of  persons  receiving  services, 
and  the  level  of  poverty  in  the  county. 
The  formula,  unfortunately,  did  not  take 
into  consideration  the  preexisting  service 
delivery  pattern.  Providers  were  congre- 
gated in  urban  areeis  and  most  funds  were 
already  in  the  most  populous  counties. 
The  proposed  formula  failed  to  deal  with 
this  issue  and  after  much  public  outcry 
wzis  withdrawn  (Hadley  et  al.  1983). 

The  formula  "didn't  fly  politically."  Several 
State  officials  claimed  that  pzist  unbalanced 
growth  patterns  had  resulted  partly  from 
"favored  political  treatment,"  especially  of  a 
few  large  urban  are«is. 

During  this  time,  although  county- to-state 
reporting  requirements  were  in  place  "theo- 
retically," compliance  was  low  and  data  of 
poor  reliability.  According  to  one  official, 
"trzuiitional  planning  philosophy,"  which  Penn- 
sylvania had  bought  into,  had  generated  "giant 
documents  which  were  filled  with  . . . (ex- 
pletive deleted)." 


In  November  1978,  a new  Governor  was 
elected.  A new  Secretary  of  Public  Welfare 
was  appointed,  and  the  incumbent  Deputy 
Secretary  (Commissioner)  for  Mental  Health 
left,  leaving  an  Acting  Deputy  Secretary  in 
charge.  The  Director  of  the  Office  of  Com- 
munity Programs  spent  considerable  time 
briefing  the  new  Secretary  and  discussing  the 
need  for  and  ways  to  obtain  better  data.  In 
time,  the  Secretary,  an  economist  by  training, 
called  on  the  Department  for  better  ways  to 
allocate  State  funds.  She  requested  that  a 
performance  measxarement  system  for  both 
mental  health  and  mental  retardation  pro- 
grams be  installed  for  the  next  biuiget  allo- 
cation cycle.  Since  that  cycle  was  to  begin  in 
only  about  2 weeks,  the  community  program 
staff  wzis  forced  into  "12-hour  days,"  forging  a 
crude  first  approximation  to  county  perform- 
ance comparisons  for  mental  health  services 
using  ragged,  partial  data.  They  "used  an5d:hing 
we  could  find"  and  were  forced  to  make  all 
calculations  by  hand  with  a desk  calculator. 
(Performance  score  calculations  and  rankings 
are  now  done  by  computer.) 

State  allocations  of  new  budget  incresises 
were  based  on  the  indicator  rankings  calcu- 
lated by  the  State.  This  departure  from  past 
methods  created  some  new  "winners"  and 
"losers."  The  allocations  were  apparently  met 
with  an  "enormous  furor,"  a flood  of  calls,  and 
political  protests  from  aroused  legislators  to 
the  new  Secretary  of  Public  Welfare.  She 
backed  the  Office  of  Mental  Health,  its  allo- 
cations and  "held  the  line."  Though  protests 
continued  for  the  first  several  years,  a more 
refined  system  is  now  a part  of  the  established 
landscape. 

Interviews  suggest  that  the  reasons  for  de- 
veloping performance  indicators  in  the  first 
place  included  a mixture  of  at  least  the  fol- 
lowing interdependent  desires: 

• To  rectify  imbalances  and  per  capita  in- 
equities throughout  the  State  that  had 
accumulated  from  past  allocation 
practices 

• To  redirect  funds  to  services  for  the 
chronically  mentally  ill  and  other  State 
priorities 

• To  increase  local  "accountability"  to  the 
State  for  the  use  of  State  funds 

• To  increcLse  the  power  and  influence  of 
the  State  Office  of  Mental  Health  over 
county  programs 


109 


Performance  Factors 

Pennsylvania's  approach  has  apparently  been 
to  rely  on  existing  data;  to  keep  the  number  of 
performance  indicators  "small";  to  keep  them 
relatively  "simple";  to  keep  the  overall  S3^tem 
"flexible"  by  piloting  new  factors,  dropping  a 
few  old  ones  zind  modifying  others;  to  focus 
the  use  of  indicator  information  on  the  allo- 
cation of  State  money  in  the  belief  that 
"dollars  drive  the  system";  and  to  make  part  of 
the  indicator  score  dependent  on  the  time- 
liness and  completeness  of  reports  and  data  on 
which  the  whole  indicators  approach  depends. 

Though  much  of  the  Pennsylvania  Perform- 
ance Indicator  approach  has  been  written 
(Heidley  1982),  one  heavily  involved  official 
reported  that  a lot  still  remains  "in  my  head," 
and  it  would  take  several  days,  if  not  weeks, 
to  teach  its  full  operation.  Though  the  basic 
concepts  of  the  approach  appear  simple,  their 
use  and  application  are  not.  They  are  partly 
still  learned  by  experience  and  transmitted  by 
an  "oral  tradition."  A full  understanding  rests 
on  being  familiar  with  the  "details  of  the  data 
system"  and  "where  the  counties  are." 

Originally  a crude  handful,  the  Pennsylvania 
approach  employed  the  eight  "factors"  de- 
scribed in  ej^bit  1 to  calculate  a summairy 
performance  "z"  score  for  fiscal  year  1982-83. 

The  performance  factors  relate,  in  turn,  to 
about  nine  county  "cost  centers"  (areas  of 
expenditure): 

1.  Administrator's  Office 

2.  Community  Services  (C&E) 

3.  Case  Management 

4.  Outpatient  Services 

5.  Partial  Hospitalization 

6.  Emergency  Services 

7.  Vocational  Rehabilitation 

8.  Social  Rehabilitation 

9.  Residential  Arrangement 

Although  most  of  these  centers  are  rela- 
tively distinct,  separate  cost  allocations  for 
case  management,  outpatient,  and  emergency 
services  are  sometimes  hard  to  make  cleanly. 
In  some  instzmces  they  are  aggregated  for 
reporting. 

The  eight  performance  factors  (plus  three 
needs  factors)  are  narratively  defined,  de- 
scribed and  illustrated  in  over  eight  single- 
spaced pages,  part  of  a larger  set  of  instruc- 
tions that  exist  in  several  separate  pieces.  By 
the  end  of  1982,  they  had  not  been  consoli- 
dated in  a single  source,  making  a direct  view 
of  the  Pennsylvania  procedures  difficult. 


Bidiibit  1.  Pennsylvania  performance 
factors,  FY  82-83 


Distribution  of  Dollars-  This  factor  measures 
the  deviation  of  a county's  distribution  of 
dollars  from  the  average  distribution.  Its  pur- 
pose is  to  encourage  counties  to  provide  a 
balance  of  types  of  services  to  patients. 

Responsiveness  to  Need- -lids  factor  meas- 
ures the  extent  to  which  a county  directs  its 
resources  into  acute  or  aftercare  services 
given  the  relative  need  for  services  in  those 
two  arezis.  Its  purpose  is  to  reward  counties 
which  are  appropriately  responsive  to  needs. 

Revenue  Generation— This  factor  measures  the 
level  of  revenue  (Medical  Assistance,  client 
liability  zind  third  party  insurance)  gen- 
eration in  a variety  of  cost  centers.  Its  pur- 
pose is  to  encourage  counties  to  maximize  the 
collection  of  revenues  for  services  provided. 

Service  System  Output- This  factor  mezisures 
the  total  face-to-face  service  unit  production 
per  dollar  spent  in  a given  county.  Its  purpose 
is  to  encourage  counties  to  purchase  less 
costly  and  less  restrictive  service  alternatives. 

Unit  Cost-  This  factor  measures  the  unit  cost 
of  services  in  outpatient  and  partial  hospi- 
talization services.  Its  purpose  is  to  encourage 
clinical  staff  efficiency  and  the  cost-effective 
provision  of  services. 

State  Mental  Hospital  Admissions- This  fac- 
tor mezisures  the  level  of  less-than-60-day- 
admissions,  reauimissions,  and  302  (emergency) 
admissions  to  the  State  hospitals,  while  con- 
trolling for  number  of  aftercare  clients  in  the 
community.  Its  purpose  is  to  encourage  coun- 
ties to  reduce  their  use  of  State  hospitals. 

Aftercare  Follow  XZong— This  factor  meas- 
ures the  followup  of  county  programs  for  pa- 
tients discharged  from  State  hospitals.  It  will 
include  a measure  of  rate  and  timeliness  of 
followup.  Its  purpose  is  to  encourage  the  entry 
of  all  former  State  hospital  patients  into  com- 
munity treatment  as  rapidly  as  possible. 

Report  Submission-  This  factor  meaisures  the 
timeliness  of  Expenditure  Reports,  Annual 
Plans,  and  CCR  Reports,  and  the  completeness 
of  the  Annual  Plan.  Its  purpose  is  to  encourage 
the  submission  of  timely  and  complete  reports. 

Source:  Nelson  1981.  pp.  1-2,  verbatim.  Updated 
allocation  and  needs  factors  are  in  Nelson  1984. 


110 


In  the  State's  allocation  package  to  coun- 
ties, the  State  suggests  ways  a county  can 
improve  its  performance  scores.  To  improve 
the  score  on  "Distribution  of  Dollars"  a county 
"should  concentrate  on  one  or  two  cost  centers 
causing  the  most  trouble."  To  improve  the 
"Responsiveness  to  Need"  score,  "...  if  you 
have  high  acute  care  need  but  low  aftercare 
need,  you  should  spend  proportionately  more 
of  your  resources  (relative  to  other  counties) 
in  acute  care  and  less  in  aftercare  (relative  to 
other  counties)."  Individual  "z"  scores  are 
calculated  for  each  factor  (beised  sometimes 
on  the  calculation  of  sub-factors).  Twenty- 
nine  pages  of  computational  procedures  are 
employed  (Hadley  1982). 

The  State  stresses  that  areas  of  priority  are 
openly  addressed  in  the  presentation  and  dis- 
cussion of  indicators.  A basic  intention  is  to 
reduce  per  capita  expenditure  inequities 
among  the  counties.  Other  priorities  include 

• Promoting  services  for  the  chronically 
mentally  ill,  partly  through  rewarding  the 
provision  of  transitional  care,  aftercare, 
and  follow-along 

• Substituting  outpatient  and  partial  hos- 
pitalization services  for  State  hospitals 

• Encouraging  the  collection  of  revenues 
and  fees  from  all  available  sources  and 
increzising  efficiency  by  a focus  on 
"service  ssrstem  output"  and  "unit  cost" 

• Encouraging  the  timely  submission  of 
complete  reports 

Pennsylvania  does  not  ask  for  indicators 
directly  related  to  treatment  "outcome"  or 
services  effectiveness.  These  are  judged  to  be 
areas  where  methods  for  easy  routine  as- 
sessment do  not  now  exist.  Similarly,  though 
the  State  stresses  meeting  "needs,"  it  admits 
this  is  a complex  and  murky  area  where,  as  the 
Director  of  Community  Programs  stressed, 
"I'm  looking  for  almost  anything  that  would 
work." 

The  State  heis  piloted  several  additional 
measures  over  the  past  few  years.  Factors  for 
fiscal  year  1982-83  were: 

Accessibility  of  Services— a.  gross  com- 
parison of  the  "number  of  clients  in  service." 

Distribution  of  Services— the  distribution  of 
"units  of  service"  (rather  than  dollars)  within 
aftercare  and  acute  care.  (It  would  subsume 
the  current  factor  "Distribution  of  Dollars".) 


Average  Units  per  Client- the  number  of 
"units  of  service  the  average  aftercare  or 
acute  care  client  receives  in  each  of  the 
service  areas"  ("the  fewer  units  per  client,  the 
better  the  score"). 

CRR  (Community  Residential  Rehabilita- 
tion) Productivity- -counties  will  be  "pro- 
gressively" rewarded  for  exceeding  85  percent 
bed  occupancy. 


Mechanics,  Scores,  and  Concepts 

Cycle.— The  preparation,  examination,  an- 
nouncements, and  use  in  budget  allocations  of 
performance  scores  occur  over  the  12-month 
budget  cycle.  The  numerous  steps  involve  an 
iterative  process  of  tentative  scores,  an- 
nouncements, tentative  budgets,  final  scores 
plus  allocations,  "rebudgets"  and  supple- 
mentals. 

Normal  Curve  and  "z"  Scores.-  The  PMM 
approach  is  based  on  the  assumption  that 
county  performance  scores  should  follow  a 
normal  (bell  shaped)  distribution.  An  average 
(mean)  of  all  county  scores  is  calculated  and 
dispersion  around  the  mean  is  calculated  at  a 
capped  standard  deviation  of  2.0  to  eliminate 
the  influence  of  aberrant  extreme  scores. 
Performance  scores  are  then  grouped  by  level 
(six  groups  in  FY  1982-83)  based  on  State 
judgment.  Using  the  amount  of  funds  that  the 
State  decides  to  distribute  as  a new  incre- 
ment, declining  percentage  increases  are  es- 
tablished for  each  group,  usually  with  the 
bottom  group  or  two  receiving  little  or  no  in- 
crease. Fiscal  year  1982-83  increzises  ranged 
from  2.0  to  6.5  percent.  Once  final  scores 
have  been  established,  a cast-iron  rule  of  no 
changes  is  invoked  unless  a data  error  has  been 
involved.  This  rule  reportedly  keeps  the  whole 
process  from  unraveling  into  a series  of  end- 
less changes. 

Relative  Standards  and  Competition. — By 
using  relative  performance  standards  based  on 
intercounty  comparisons,  Pennsylvania  hzis 
induced  competition  among  county  mental 
health  programs  on  performance  scores.  This 
is  a significant  and  innovative  feature.  It  has 
strong  appeal  aind  appears,  in  principle,  to  be  a 
desirable  element  of  a performance  incentive 
system.  Its  use  and  consequences  deserve 
further  attention,  discussion,  and  analysis. 
Some  counties  reportedly  feel  uncomfortable 
being  compared  to  other  comties. 


Ill 


The  "Mean"*  as  a Statewide  Standard.— Fot 
two  performance  factors  (Distribution  of 
Dollars  and  Responsiveness  to  Need),  the  PMM 
approach  assumes  that  county  programs  should 
move  toward  the  State  average  performance. 
Extremes  in  performance  on  these  two  fac- 
tors- -high  zis  well  2is  low- -are  penalized.  The 
optimal  zone  for  scoring  is  reportedly  1 
standard  deviation.  This  practice,  like  the  use 
of  the  normal  curve,  has  great  simplifying 
power  and  considerable  implications  for  pro- 
gram development.  It  is  criticized  by  some 
counties.  Does  the  "average"  in  a sparse  small 
county  mean  the  same  thing  as  the  "average" 
in  Pittsburgh  or  Philadelphia?  Is  a State 
"average"  a desirable  target? 

’The  Floating  Afcon".— Because  total  county 
performance  scores  (z)  are  established  relative 
to  other  counties,  because  county-to-county 
performance  changes  at  different  rates  and  in 
different  directions  from  year-to-year,  and 
because  the  factors  on  which  scores  are  based 
are  modified  somewhat  from  year-to-year, 
the  "mean"  floats  from  one  year  to  the  next. 
As  a result,  the  relative  position  of  a given 
county  cannot  be  precisely  forecast.  This  gives 
the  scoring/ranking  process  a dynamic  quality 
and  may  induce  a degree  of  suspense  about 
final  scores. 

Discrimiriating  Versus  Nondiscrimiriating 
Factors.— The  State  reports  that  the  variation 
among  performance  scores  on  at  least  one 
factor  (Report  Compliance)  has  grown  so  small 
(reporting  heis  improved  across  the  board)  that 
even  slight  variations  in  performance  (a  report 
a day  or  a week  late)  can  throw  a county's 
score  far  off.  Some  change  will  probably  be 
made  in  the  calculation  or  use  of  this  factor  to 
eliminate  its  disproportionate  influence. 

Equal  Factor  Weighting.- -The  State  weights 
all  factors  equal  in  calculating  the  overall 
county  performance  score,  simply  adding 
scores  of  the  eight  factors.  To  suggestions 
that  factors  be  weighted,  the  Director  of 
Community  Programs  asks:  "On  what  basis 
should  they  be  weighted?"  In  fact,  as  he  points 
out,  services  and  target  groups  at  the  focus  of 
this  PMM  approach  are  already  weighted- 
some  groups  appear  and  others  do  not;  some 
services  are  encouraged  (e.g.,  aftercare)  and 
others  are  not  (e.g.,  inpatient  care). 

Face-Validity.— The  eight  factors  are  jus- 
tified in  part  on  their  face  validity,  not  be- 
cause there  is  a demonstrated  relationship 
between  them  and  patient  outcome. 


"Losers”. — The  system  seems  to  be  designed 
and  employed  in  such  a way  that  there  are 
always  some  winners  and  some  losers— but 
wisely,  not  too  many  losers  and  no  "big  losers" 
or  "big  winners."  Losers  get  mad.  Big  losers 
may  get  very  mad.  If  indicators  showed  a 25 
percent  loss  for  Philadelphia,  for  example, 
would  the  State  allocate  on  that  basis?  In 
general,  there  are  practical  political  parame- 
ters to  the  use  of  PMM  approaches. 

JudgTnenf.-  Though  it  is  characterized  as  a 
system  run  "by  the  numbers,"  a considerable 
amount  of  informed  judgment  and  wisdom 
about  what  county  programs  are  like  zind  why 
indicator  data  change  is  involved.  Judgment  is 
involved  in  the  selection  and  design  of  indi- 
cators and  in  the  interpretation  of  indicator 
results.  Experience  and  judgment  are  used  to 
spot  and  correct  artifacts  and  radical  changes 
in  indicator  scores  that  might  result  in  arbi- 
trary and  undesirable  intercounty  comparisons. 
Here,  as  in  many  budget  and  management  data 
systems,  judgmental  adjustments  and  "fudge 
factors"  are  essential  to  make  the  system 
work  reasonably. 

Brief  Summary 

The  Pennsylvania  PMM  approach  was  de- 
veloped rapidly  and  imposed  on  county  pro- 
grams by  the  State  when  a "political  window" 
opened  to  an  innovative  official  during  a 
change  in  State  leadership.  A Cabinet  sec- 
retary provided  sustained  "support  from  the 
top."  The  system  is  wisely  used  to  allocate 
budget  increments  and  not  the  entire  budget 
bzise.  It  uses  only  a few  indicators  and  relative 
and  normalized  scores.  The  approach  rests 
openly  on  the  premises  that  "money  drives  the 
S3^tem"  and  that  "data  improves  when  it  is 
used."  Though  the  approach  is  simple  in  con- 
cept, it  is  not  simple  in  operation.  It  is  not 
wholly  clear  which  phases  operate  "by  the 
numbers"  and  which  rest  on  the  use  of  rea- 
sonable and  political  judgment. 

Tennessee’s  Performance  Measurement  System 
in  Brief 

Prior  to  the  development  of  its  own  PMM 
approach,  Tennessee  had  generally  endorsed 
the  19  standards  areas  set  out  in  Federal 
guidelines  for  the  Community  Mental  Health 
Centers  Program.  When  block  grant  legislation 
Wcis  passed  in  1981,  Federal  standards  were 
dropped  and  the  State  "started  over."  The 


112 


Exhibit  2.  Framework  of  Tennessee 
progrsun  standards 


Number  of 

Domain  indicator  areas 

Administrative 

I. 

Clinical  Records 

8 

II. 

Governing  Body/ 

Advisory  Groups 

4 

III. 

Quality  Assurance 

4 

IV. 

Disaster  Planning 

1 

Activities 

V. 

Case  Management 

5 

VI. 

Crisis  Stabilization 

3 

Program 

VII. 

Outpatient  Services 

16 

VIII. 

Emergency  Services 

9 

IX. 

Day  Treatment 

4 

X. 

Transitional/Residential 

7 

XI. 

Forensic  Services 

8 

XII. 

Consultation  and 

Education 

4 

73 

mental  health  agency  devoted  3 to  4 months  to 
\dgorous  internal  debate  over  standards  that 
one  official  called  "a  basic  goals  and  values 
clarification  process."  Painful,  heated  and 
prolonged,  the  process  resulted  in  draft  State 
standards.  Meanwhile,  the  Tennessee  Associ- 
ation of  Mental  Health  Centers  had  been  de- 
veloping in  parallel  its  own  version  of  stand- 
ards, which  they  proposed  not  only  for  moni- 
toring and  funding  but  accreditation  as  well. 
Several  joint  meetings  between  the  State  and 
Center  Association  resulted  in  tradeoffs  and 
compromises  and  led  to  standards  organized 
into  12  betsic  performance  domains  (types  of 
administrative  and  service  activities)  shown  in 
Exhibit  2. 

Within  these  domains,  73  indicator  areas  are 
then  specified.  Each  indicator  area  is  elabo- 
rated in  the  following  terms: 

Intent-  The  purpose  or  the  intended  end 
product  or  condition  desired. 

Indicator- index  or  metric  to  be  used  to 
mesisure  the  intent. 

Criteria- The  expected  value  of  the 
indicator. 

Measurernent- The  method  to  be  used  to 
gather  the  data  to  assess  the  criteria. 


Programs  are  judged  either  in  or  out  of 
compliance  with  each  standard  (Tennessee 
Department  of  Mental  Health  and  Mental  Re- 
tardation 1982). 

Because  it  is  still  under  development,  there 
are  many  "blanks"  in  the  system.  Few  indica- 
tors are  quantitative.  Rather,  most  are  based 
on  a show  of  documentary  evidence  (e.g.,  a 
plan)  or  on-site  observation  (an  inspection  of  a 
client  log).  No  total  performance  score  is  cal- 
culated. The  intended  process  for  the  applica- 
tion of  standards  is  through  regular  data 
collection  and  a site  visit.  In  time  the  State 
intends  to  integrate  standards  with  perform- 
ance contracting  and  evaluation. 

Some  of  the  factors  that  contributed  to  the 
current  approach  included: 

• Leadership  changes  brought  new  actors 
into  the  Division  who  favored  a stronger 
role  in  implementing  State  priorities  and 
holding  centers  accountable. 

• TTie  State  budget  was  getting  tighter  due 
to  revenue  shortfalls. 

• The  State  legislature  wanted  better  an- 
swers about  the  expenditure  of  mental 
health  funds. 

• The  State  agency  wanted  to  replace 
State-endorsed  Federal  standards  with 
new  standards  oriented  more  to  outcome 
and  away  from  process,  and  to  give  them 
"teeth." 

• As  the  State  investment  in  local  mental 
health  services  grew,  the  State  became 
more  concerned  that  local  programs  im- 
plement State  priorities. 

• Federal  block  grant  legislation  provided  a 
major  opportunity  to  emphasize  State 
program  philosophies,  priorities  and 
standards. 

Colorado’s  Performance  Measurement  System 
in  Brief 

Colorado  has  a multi-element  PMM  system, 
including  a statewide  management  information 
system,  performance  contracts  with  local 
service  programs,  and  a S5^tematic  annual  site 
assessment  of  these  programs  (Miller  and  Wil- 
son 1981).  A performance  contract  is  the 
principal  instrument  through  which  resources 
are  allocated  in  accordance  with  selected  as- 


113 


pects  of  service  performance. 

A preliminary  performance  contracting  sys- 
tem was  introduced  in  fiscal  year  1978-79  and 
formalized  the  next  year.  New  State  leaders 
wanted  to  improve  accountability  by  focusing 
contracts  on  "specific  outcomes"  and  "tasks" 
negotiated  by  the  center  and  the  State. 

A contract  format  was  negotiated  with  the 
State's  CMHC  and  Clinics  Association.  Experi- 
enced at  negotiating,  the  State  had  prepared  a 
"careful  strategy"  and  elected  to  introduce  the 
system  from  the  top  sind  rather  quickly.  By 
contrast,  centers  were  apparently  not  accus- 
tomed to  negotiating  and  were  relatively  un- 
prepared for  it.  The  result  was  a "hezid  on  col- 
lision" which  generated  tension  and  conflict- 
centers  felt  betrayed.  The  State  was  then 
obliged  to  cope  with  the  centers'  negative 
feelings  and  bring  them  into  a negotiating 
mood.  Subsequent  compromises  based  on 
tradeoffs  zind  mutual  concessions  were  reached 
and  a general  contract  format  was  established. 

The  performance  contract  is  negotiated  in 
two  phases.  First,  the  Division  and  a committee 
of  the  State  CMHC  and  Clinics  Association 
negotiate  to  establish  the  specific  contract 
format  zind  conditions  that  will  apply  to  all 
centers.  Second,  a State  team  meets  with  cen- 
ter representatives  to  review  center  proposals, 
principally  those  pertaining  to  the  expected 
level  of  admissions  of  selected  target  groups. 
Although  some  of  this  review  is  reportedly  pro 
forma  due  to  the  uniform  features  of  center 
contracts,  target  admissions  figures  are  nego- 
tiated in  a varied  pattern. 

A typical  contract  consists  of  about  a dozen 
single-spaced  legal-sized  pages  that  specify  the 
respective  responsibilities  of  the  State  and  the 
center  (State  of  Colorado,  n.d.).  The  contract 
contains  quantitative  performance  measure- 
ments of  : 

• The  number  of  admissions  by  four  age 
groups  (0-10;  11-17;  18-59;  60+) 

• The  number  of  minorities  (Native  Ameri- 
can, Asian  Pacific,  Black,  and  Hispanic)  to 
be  served 

• The  number  of  severely  disabled  (classified 
as  serious,  critical,  or  chronically  mentally 
ill,  as  defined  in  State  instructions)  to  be 
served 

Teu^get  group  admission  levels  are  linked  in 
the  contract  to  "fiscal  penalties"  for  failure  to 
serve  the  agreed  number  of  clients  in  each 
category.  Several  opportunities  are  provided  to 


centers  to  renegotiate  minimum  client  admis- 
sions without  penalty.  If  a center  is  not  within 
10  percent  of  the  target  admissions  after  5 
months  and  again  after  9 months,  "it  may  sub- 
mit an  explanation  and  a plan  for  correction- - 
plus  a request  for  revised  performaince  figures, 
if  necessaiy  (Miller  and  Wilson  1982,  p.  188). 

If  a center  does  not  reach  93  percent  of  its 
target  by  the  end  of  the  fiscal  year,  "the  Divi- 
sion may  reduce  funds  proportionately  in  the 
next  yeair's  contract."  Funding  reductions  are 
limited  by  the  contract,  however,  to  5 percent 
of  the  center's  total  contract.  This  stop-loss 
provision  was  appawently  negotiated  into  the 
contract  format  by  the  center's  association. 

State  officials  report  that  with  one  exception 
pertaining  to  late  data,  the  fiscal  penalty  pro- 
vision had  not  been  used,  partly  because  there 
had  not  been  adequate  agreement  on  the  reli- 
ability of  the  data  b«isc  employed.  The  Divi- 
sion's Deputy  Director  reported  that  since  data 
quality  now  is  adequate,  the  penalty  provision 
will  be  used  actively  in  the  future.  Similarly, 
though  the  State  has  very  good  unit  cost  and 
service  data,  by  the  end  of  1982  these  data 
were  not  yet  directly  related  to  client  admis- 
sions or  to  fiscal  penalties. 

Beyond  admissions  targets  and  fiscal  penal- 
ties, the  contract  has  major  provisions  per- 
taining to  standards  that  centers  must  meet-- 
independent,  annual  CPA  audits  of  management 
services;  assurances  that  clients  benefit  from 
care  (to  be  assessed  in  the  long  run  by  a 
statewide  outcome  monitoring);  evidence  of 
service  to  those  most  in  need;  quarterly  utili- 
zation and  fiscal  data  on  a sample  of  clients; 
and  reasonable  reimbursement  rates  (Miller  and 
Wilson  1981). 

In  addition  to  contracts,  the  State  conducts 
annual  site  aissessments  that  monitor  a brozid 
range  of  conditions  through  judgments  based  on 
a variety  of  evidence  about  whether  a center 
program  "meets  requirements."  The  site  as- 
sessment instrument  illustrates  the  State's  de- 
sire for  "soft"  information  to  supplement  "hard" 
(measured)  aspects  of  program  performance. 
The  Colorado  approach  blends  quantitative 
measurement  with  interactive  monitoring. 

In  sum,  the  Colorado  performance  con- 
tracting instrument  is  relatively  comprehensive 
and  detailed  in  coverage.  Its  use  to  control  the 
expenditure  of  contract  funds  is  limited  by  the 
fact  that  fiscal  penalties  are  linked  directly 
only  to  admissions  targets  and  by  a financial 
penalty  capped  at  5 percent  for  admission 
shortfalls.  Additional  information  that  could  be 
used  to  increase  State  control  is  provided 
through  the  terms  of  the  contract.  The  State 


114 


intends  to  tighten  and  improve  the  performance 
contracting  mechanism. 


General  Observations  and  Conclusions 

1.  The  specific  factors  that  contributed  to 
the  initiation  of  PMM,  the  mix  and  pat- 
tern of  motives,  internal  and  external 
incentives,  and  political  bureaucratic 
opportunities  varied  among  the  three 
States.  No  two  of  the  cases  were  either 
identical  or  closely  similar. 

2.  Yet  the  three  cases  shared  a few  basic 
determinants: 

a.  Some  officials  were  looking  for  ways 
to  increase  their  understanding  of  the 
current  uses  of  State  funds  by  local 
programs. 

b.  Some  officials  were  seeking  addi- 
tional means  to  direct  resources 
toward  local  program  services  or 
target  groups  they  felt  were  ne- 
glected and/or  to  influence  local 
program  operations  and  developments 
in  the  direction  of  State  priorities. 

c.  Some  officials  saw  accountability  of 
local  service  agencies  to  the  State  as 
a justification  for  some  form  of  PMM. 

d.  Some  officials  saw  PMM  as  a way  to 
increeise  State  control  over  local 
service  programs. 

3.  Identifiable  individuals  or  small  groups 
pressed  for  formal  PMM  approaches.  The 
characteristics  of  these  advocates,  their 
competencies,  personalities,  and  rela- 
tionships with  others  influenced  the 
PMM  sjrstems  significantly  and  in  dis- 
tinctive ways.  People  make  technical  or 
quasi-technical  systems  work. 

4.  In  no  case  did  any  key  actor  imply  that 
there  is  some  final,  ultimate,  or  right 
single  set  of  indicators.  Rather,  modi- 
fication based  on  experience  occurred  in 
all  States.  All  experienced  officials 
seemed  to  recognize  that  PMM  ap- 
proaches rest  on  a rudimentary  state  of 
the  art. 


5.  A major  function  of  PMM  approaches  is 
to  focus  attention  on  selected  features 
of  a service  system.  How  this  vailue- 
bzised  selection  occurs  influences  who 
feels  "ownership”  in  the  resulting  PMM 
system  and  how  fair  the  PMM  system 
seems. 

6.  The  PMM  approaches  in  all  three  cases 
are  vulnerable  to  both  technical  and 
philosophical  criticism.  Involved  State 
officials  readily  acknowledged  this.  They 
describe  their  approaches,  not  in  terms 
of  technical  sophistication,  elegance,  or 
purity,  but  rather  in  terms  of  face- 
validity,  reasonableness,  acceptability 
and  credibility  on  the  one  hand,  and 
feasibility,  workability  and  practicability 
on  the  other. 

7.  All  PMM  approaches  are  value-influ- 
enced and  value-embedded.  Interactive 
methods  to  keep  PMM  systems  respon- 
sive to  changing  politico-biareaucratic 
conditions  and  evolving  service  phi- 
losophy and  ideology  are  important. 

8.  State  officials  saw  PMM  approaches  as 
only  one  of  an  array  of  tools  for  admin- 
istrating service  programs. 

9.  Data  do  not  speak  for  themselves.  They 
must  be  interpreted,  with  all  the  value- 
screening, variable  inference  and  sup- 
plementary knowledge  that  normally  go 
into  interpretation.  PMM  systems  2u*e 
intended  to  generate  information  to  in- 
duce changes.  All  PMM  systems  are  thus 
"political”  if  only  in  the  small  "p"  sense 
of  the  word. 

10.  All  PMM  systems  are  amenable  to 
"gaming"  by  those  who  supply  informa- 
tion on  which  their  own  performance  is 
to  be  gaged.  Once  the  logic  (rules)  of  a 
system  (game)  is  known,  informed  judg- 
ment (well  short  of  blatant  cheating)  can 
be  and  often  is  used  to  generate  favor- 
able indicator  results. 

11.  Since  measurability  is  no  index  of  im- 
portance, a performance  monitoring 
system  restricted  to  measurements  will 
necessarily  exclude  issues  that  some 
legitimate  stakeholders  in  the  system 
consider  important. 


115 


12.  Unintended  side  effects  (positive  or 
negative)  of  a formal  method  are  some- 
times more  important  than  the  intended 
effects. 

13.  Finally,  what  does  any  particular  set  of 
indicators  really  indicate  about  the  ac- 
tiial  performance  of  a complex  service 
system?  The  relationships  between  the 
actions  of  a public  agency  and  changes  in 
the  mental  health  status  of  clients  are 
intricate  and  still  partly  obscure  and 
problematic.  As  one  local  official  put 
the  point  plainly:  "After  all,  what  the 
State  is  observing  is  the  behavior  of  the 
numbers  in  our  performance  reports,  not 
the  behavior  of  the  mental  health  care 
system." 


References 

Hadley,  T.R.  Memorandum  From  Pennsylvania 
Office  of  Mental  Health  to  County  Admin- 
istrators, "Data  for  Performance  Measures," 
January  1982.  Attachments  2,  4,  and  6 (82- 
83  Performance  and  Needs  Factors). 

Hadley,  T.R.;  Wilcox,  J.T.;  Rossman,  G.R.;  and 
Nazar,  K.  Performance  standards  and  al- 


location of  funds  in  community  mental 
health  programs- -The  Pennsylvania  sys- 
tem. Administration  in  Mental  He^th 
10:155-161,  1983. 

Kimmel,  W.A.  Performance  Measurement  and 
Monitoring  in  Mental  Health:  Selected  Im- 
pressions From  Three  States.  Contract  re- 
port to  the  Office  of  State  and  Community 
Liaison,  National  Institute  of  Mental  Health. 
Available  from  National  Technical  Infor- 
mation Service  (Accession  No.  PB-199802), 
1982. 

Miller,  S.,  and  Wilson,  N.  The  case  for  per- 
formance contracting.  Administration  in 
Mental  Health  8:185-193,  1981. 

Nelson,  S.H.  Performance  and  need  factors  for 
fiscal  year  1982-83  allocations.  In:  Mental 
Health  Bulletin  No.  5000-81-02.  Pennsyl- 
vania Office  of  Mental  Health,  October  26, 
1981. 

Nelson,  S.H.  Allocation  and  Needs  Factors  for 
FY  85-86.  In:  Mental  Health  Bulletin  No. 
99-84-34.  Pennsylvania  Department  of 
Public  Welfare,  August  10,  1984. 

State  of  Colorado,  Contract  form  6-AC-02A. 
n.d. 

Tennessee  Department  of  Mental  Health  and 
Mental  Retardation.  "Community  Mental 
Health  Centers  Standards,"  April  1982. 


116 


Unintended  Effects  of  Program  Performance  Measurement  in  Pennsylvania 


Pauline  E.  Ginsberg 
Utica  College  of  S50*acuse  University 


Introduction 

With  regard  to  mental  health  budgeting,  the 
Commonwealth  of  Pennsylvania  is  unique  in 
that  since  fiscal  year  1979-80,  performance 
indicators  have  been  used  directly  to  allocate 
State  funds  to  county  mental  health  programs. 
Although  the  details  vary  from  year  to  year, 
the  system  involves:  (1)  county  submission  of 
quantitative  data  in  7 to  12  categories,  (2) 
State  calculation  of  z-scores  for  each  per- 
formance factor  and  their  total  to  reveal  each 
county's  overall  relative  performance,  and  (3) 
State  allocation  of  budget  increases  in  accord 
with  relative  performance  scores. 

From  the  point  of  view  of  State  officials, 
this  method  vastly  improves  previous  methods 
of  allocating  State  funds  to  county  programs. 
Previously  there  was  no  "objective"  means  of 
evaluating  county  budget  requests,  and  it  was 
suspected  that  rewards  were  distributed  on  the 
basis  of  astute  politics  and  manipulation  of 
power  rather  than  good  management  or  effi- 
cient delivery  of  services.  Performance  indi- 
cators provided  both  an  objective  tool  to  aid 
budgetary  decisionmaking  and  means  by  which 
the  State  could  encourage  aspects  of  mental 
health  care  consistent  with  State  policy  and 
State  managerial  efficiency.  The  state  role  is 
that  of  overseer  and  provider  of  regulations 
and  policies  relative  to  the  Pennsylvania  Men- 
tzd  Health/Mental  Retardation  Act  of  1966, 
while  the  county  role  is  that  of  broker,  con- 
tracting with  providers  to  assure  service 
availability.*  In  accomplishing  its  intended 
effects,  the  method  was  exemplary  (Hadley  et 
al.  1983). 

Yet  the  student  of  bureaucracy  must  also  be 
a skeptic— one  who  looks  at  the  unintended 


*I  am  grateful  to  Dorothy  Fulton,  Acting  Director, 
Office  of  Community  Programs,  Office  of  Mental 
Health,  Commonwealth  of  Pennsylvania,  for  clar- 
ification of  the  role  relationship. 


effects  of  regulation  and  policy  along  with  the 
accomplishment  of  explicit  goals.  The  lessons 
taught  by  the  sociologist  Peter  Blau  and  the 
social  psychologist  and  methodologist  Donald 
T.  Campbell  should  not  be  ignored.  In  Blau's 
The  Dynamics  of  Bureaucracy  (1955),  admin- 
istrators of  a State  employment  agency  were 
delighted  to  find  an  objective  method  of 
measuring  staff  performance.  When  a single 
indicator  was  insufficient,  they  added  another 
and  were  content.  Their  anticipated  goal  had 
been  met.  The  observer,  Blau,  however,  noted 
unanticipated  effects  of  the  change.  Some  are 
beneficial-  reduction  of  racial  bias,  for  ex- 
ample. Others,  such  2is  staff  hoarding  of  job 
opening  notices,  are  less  so.  Still  others  (for 
example,  a change  in  the  definition  of  "job 
placement")  are  neither  good  nor  bad  of 
themselves,  but  make  assessment  over  time 
more  difficult. 

It  is  this  last  category  of  side  effect  upon 
which  Campbell  (1979)  commented: 

The  more  any  quantitative  social  indi- 
cator is  used  for  decisionmaking,  the 
more  subject  it  will  be  to  corruption 
pressures  and  the  more  apt  it  will  be  to 
distort  and  corrupt  the  social  processes  it 
is  intended  to  monitor  (p.  85). 

In  the  Blau  example  above,  a number  of 
changes  in  the  process  of  job  placement  oc- 
curred. Pennsylvania's  system  of  tying  allo- 
cations to  indicators  seems  a clear  candidate 
for  similar  distortion  of  both  indicators  and 
social  processes. 

In  this  regard  it  must  be  understood  that 
Pennsylvania's  uniqueness  dwells  in  its  use  of 
well  defined  indicators  that  are  clearly  linked 
to  funding  and  not  in  the  response  to  those 
indicators.  In  fact,  it  is  Pennsylvania's  very 
success  in  implementing  performance  meeis- 
urement  that  suggests  it  as  a test  Ceise  for  a 
theory  of  unanticipated  side  effects  of  indi- 


117 


cator  use.  Unanticipated  side  effects  tend  to 
be  a persistent  problem  (see  Ginsberg,  this 
volume).  When  they  occur  as  a response  to 
vague  indicators  that  are  indifferently  linked 
to  consequences,  however,  side  effects  are  all 
too  easily  attributed  to  faulty  indicators  or 
linkages.  Pennsylvania's  carefiil  use  of  allo- 
cation indicators  in  mental  health  prevents 
such  an  easy  dismissal  of  the  problem. 

To  examine  possible  side  effects,  it  is  nec- 
essary to  look  beyond  the  demonstrable  prog- 
ress cited  by  the  Office  of  Mental  Health  to 
the  level  of  county  programs  and,  beyond  that, 
to  actual  service  delivery.  A qualitative  ap- 
proach to  doing  so  was  initiated  during  the 
summer  of  1983. 


Method 

Interviews  were  attempted  with  the  admin- 
istrators (or  their  designees)  of  all  43  county- 
level  programs.  Conducted  in  person  or  by 
telephone  (following  an  explanatory  letter), 
the  focus  of  interviews  was  "...  your  own 
experience  with  the  performance  factors." 
Respondents  were  promised  anonymity  and 
asked  "How  has  your  county  mental  health 
delivery  system  changed  due  to  their  use?" 

Data  were  obtained  for  37  (86  percent)  of 
the  programs.  Interview  length  varied  from 
under  15  to  over  40  minutes.  A few  programs 
sent  supporting  documents;  a few  adminis- 
trators incliaded  other  staff  members  in  the 
conversation.  Most  typical  was  a 15  to  20 
minute,  relatively  unstructured  telephone 
conversation  with  an  administrator  who  ex- 
pected the  call  but  had  made  no  special  prep- 
arations for  it.  Usually  little  prompting  was 
needed.  If  interviewees  seemed  at  a loss  re- 
garding changes  that  might  be  attributed  to 
performance  indicators,  the  interviewer  noted 
that  some  counties  had  revised  their  record- 
keeping systems  and/or  computerized  them  in 
order  to  provide  data  and  asked  whether  this 
might  be  the  case  in  their  area.  If  service  de- 
livery was  not  mentioned  spontaneously,  this 
too  was  brought  up.  When  no  changes  were 
noted,  the  interviewer  asked  specifically  about 
aftercare  follow- along  since  change  in  this 
area  was  virtually  mandated. 


Results 

County  program  administrators  endorsed  the 
current  method  of  allocating  mental  health 
budget  increzises  on  the  bzisis  of  performance 


as  an  improvement  over  earlier,  more  overtly 
political  methods.  However,  most  aidminis- 
trators  objected  to  specific  aspects  of  the 
indicators  or  their  use.  These  objections 
ranged  from  statistical  to  theoretical  and 
political.  Often  they  were  accompanied  by 
concrete  suggestions  and  specific  details.  A 
vocal  minority  expressed  overwhelmingly 
negative  opinions.  Their  objections  focused  on 
the  Commonwealth's  Mental  Health  Act, 
which  aissigned  responsibility  for  mental  health 
care  to  the  coionties.  The  establishment  of 
statewide  priorities  and  sanctions  for  non- 
conformity to  these  priorities  wzis  seen  as 
usurpation  of  county  ri^ts  and  responsibilities. 

ITie  information  solicited  in  the  interview 
concerned  changes  made  in  county  programs  in 
response  to  the  indicators.  The  changes  re- 
ported were  tallied  with  regard  to  whether 
they  affected  service  delivery  or  record- 
keeping (see  table  1). 


Table  1.  Number  of  county  programs  making 
changes  in  two  eu'eas  in  response 
to  the  State  indicator  system 


Reported  response 
to  indicators 

County 

programs 

Total 

37 

No  chzmge 

3 

Change  in  recordkeeping  only 

10 

Change  primarily  in  recordkeeping 

6 

Changes  about  equally  distributed 
between  recordkeeping  and  service  7 

Change  primarily  in  service 

2 

Change  in  service  only 

4 

Unclear  whether  change  has 
occurred 

5 

Lack  of  Changes 

Three  counties  reported  having  made  no 
changes.  Their  rezisons  for  lack  of  change 
varied  little.  They  simply  felt  powerless.  A 
typical  comment  was:  "The  idea's  not  bad,  but 
we're  not  as  flexible  as  the  State  and  can't 
change  from  year  to  year.” 

Ihe  need  to  confront  the  realities  of  local 
politics  and  the  amount  of  time  necessary  to 
develop  new  services  in  rural  areas  were  cited 
by  two  of  the  three.  The  third  despaired  of 
making  the  right  changes  in  a situation  where 
all  counties  and  State  criteria  were  also 


118 


changing  and,  hence,  preferred  to  make  none. 

Along  with  counties  that  made  little  change, 
some  nonchangers  noted  that  their  desire  to 
change  in  the  direction  indicated  by  the  State 
was  inhibited  by  their  confusion  over  just  what 
W21S  wanted.  Said  one  administrator,  "z-scores 
are  not  entirely  understood,  so  it's  hard  to 
know  what  to  manipulate  to  improve."  Large 
changes  in  rank  mystified  them: 

• Our  county  went  from  [high]  to  [low].  We 
don't  know  how  this  happened. 

• Our  mental  health  score  improved  dras- 
tically. . . . We  didn't  do  an^hing  to  ac- 
complish this  much.  . . . We  laughed. 

One  county  ejq>ressed  indifference  based  upon 
the  absolute  size  of  the  budget  increases  in- 
volved for  small  counties. 

More  common,  however,  W2is  considerable 
soul-searching  among  counties  that  felt  that 
their  changes  had  been  minor.  State  and 
county  priorities  were  weighed  against  each 
other  vdth  their  attendant  financial  implica- 
tions. Even  when  the  decision  was  to  avoid  or 
minimize  change,  county  administrators 
sometimes  saw  helpful  their  struggle  to  work 
through  the  conflict.  They  were  able  to  look  at 
their  own  programs  critically,  expressing: 

When  what  is  intended  programatically 
goes  against  what's  best  for  the  per- 
formance indicators  and,  therefore,  the 
budget,  we  have  to  rethink.  We  can  al- 
ways say  'no'  to  the  State,  if  it's  best  for 
the  county. 

These  counties  have  not  necessarily  lived 
happily  ever  after.  Said  one  administrator, 
"I've  blown  off  points  by  putting  clients  first. 
...  I refuse  to  play  games  or  let  client 
services  be  interfered  with.  . . . But  now  with 
only  _ percent  increase,  for  the  sake  of  a few 
clients,  all  will  suffer." 

Changes  in  Recordkeeping 

When  relatively  large  changes  in  county 
mental  health  operations  were  made  as  a 
response  to  the  performance  indicators,  they 
were  most  often  made  in  recordkeeping.  Ten 
counties  reported  changes  only  in  record- 
keeping; and  of  the  29  reporting  change  of  any 
sort,  only  4 reported  changes  which  did  not 
involve  recording.  The  types  of  changes 
varied.  Computerization  W2is  reported  in  four 
instances,  with  a fifth  considering  thus  inno- 


vation. Implementation  or  updating  of  man- 
agement information  systems  was  reported  by 
six  programs.  One  county  reported  hiring  a 
statistical  anal3^t,  while  nine  reported  general 
improvement  in  recordkeeping  and/or  changes 
in  recording  forms.  Improvement  of  timeliness 
of  the  annual  plan  was  the  specific  data- 
related  change  most  frequently  (five  counties) 
mentioned.  Changes  in  categorization  of 
services  to  conform  to  State  priorities  and  to 
perform  better  on  indicators  were  mentioned 
by  10  programs  that  varied  as  to  the  changes 
made  and  the  indicators  targetted. 

Four  counties  reported  having  passed  per- 
formance indicators  on  to  contracting  agen- 
cies in  one  form  or  another,  and  four  more 
reported  themselves  in  the  process  of  doing  so. 
Nine  additional  counties  had  made  substantial 
efforts  to  explain  the  constraints  that  per- 
formance indicators  place  on  the  county  and 
the  implications  of  those  constraints  for 
service  providers: 

• [Performance  indicators]  give  us  an  ex- 
cuse to  pass  things  on  and  blame  it  on  the 
State. 

• When  we  want  something  from  clinics,  we 
tell  them  it  will  affect  z-scores  and  we'll 
all  suffer  [unless  they  comply]. 

A minority  explicitly  resisted  this  shift  of 
responsibility,  and  rejected  the  idea  of  pzissing 
on  the  indicators  in  any  form.  One  adminis- 
trator explained,  "Part  of  my  job  is  to  buffer 
all  the  garbage  I get  from  the  State.  My  job  is 
to  coordinate  and  I do  it  my  way." 

Whether  originating  at  the  service  delivery 
level  or  in  the  county  office,  changes  in  cat- 
egorization were  most  often  related  to  reve- 
nue generation  and/or  distribution  of  dollars 
and/or  services.  Many  such  changes  are  un- 
doubtedly salutary.  Since  they  reflect  more 
attention  to  which  services  were  given  to 
whom,  they  also  produce  more  accurate  billing 
procedures  and  information  regarding  service 
use. 

Yet,  changes  in  recording  procedure  should 
not  be  mistaken  for  changes  in  service  de- 
livery. One  change  in  reporting  that  is  distinct 
from  services  is  double  counting.  The  practice 
occurs  in  several  forms.  In  one,  some  clients 
are  counted  as  both  outpatients  and  case  man- 
agement or  emergency  clients  for  the  same 
visit.  This  double  counting  meets  requirements 
for  the  distribution  of  service  and  generates 
more  revenue.  Another  county  records  the 
visits  of  Medicaid-eligible  clients  as  both 


119 


outpatient  (reimbursable)  and  case  manage- 
ment (non-reimbursable),  regardless  of  the 
actual  service  offered.  The  outpatient  status 
is  used  to  bill  Medicaid;  the  case  management 
status  is  reported  to  the  Office  of  Mental 
Health.  In  all,  four  counties  explicitly  reported 
such  practices  while  others  hinted  at  them. 
One  administrator  summarized  his  rule  of 
thumb: 

If  outpatient  is  face  to  face  and  czise 
management  is  face  to  face,  call  it  out- 
patient and  then  it's  billable. 

Said  another: 

We  end  up  chasing  brownie  points  and  to 
hell  with  the  program.  ...  If  they  want 
revenue  generation,  that's  what  they'll 
get. 

A third  remarked: 

[It's  all]  management  by  paper.  They  send 
it  to  you.  You  send  it  back.  Whatever  the 
reality  is,  they  don't  know.  People  can 
twist  and  fudge  and  never  get 

caught. has  had  a bureaucrat  tell 

him  to  say  he's  doing  something  whether 
he  is  or  not,  just  to  keep  out  of  trouble. 

Such  manipulations  are  a matter  of  serious 
concern  for  some: 

In  order  to  do  well  on  the  indicators,  1 
must  teach  my  staff  to  lie.  This  is  a moral 
problem.  The  State  advises  counties  to 
bill  Medicaid  for  psychotherapy  when  case 
management  is  face-to-face.  But  this 
isn't  psychotherapy  and  is  qiaasi-illegal 
also.  Otherwise  the  county  must  change 
the  service,  that  is,  give  outpatient  care 
where  case  management  is  needed.  This  is 
also  a moral  problem. 

A lot  of  counties  have  played  with  data.  1 
won't  because  I'm  afraid  of  being  caught 
in  a double-bind  [where  even  I won't  be 
able  to  keep  track  of  my  own  program.] 

Other  mmor  manipulations  occur  in  which 
equipment  costs,  for  example,  are  moved  from 
the  cost  center  where  the  equipment  is  actu- 
ally used  to  one  that  is  better  able  to  bear  the 
economic  burden.  The  same  paper  manipula- 
tion is  sometimes  used  with  staff.  In  a some- 
what more  complex  maneuver,  one  county 
placed  a salaried  staff  member  in  a social 


program  that  held  been  operated  solely  by 
volunteers.  There  were  two  eidvantages. 
Service  units  could  be  reported  that  previ- 
ously, under  a "no  dollars  no  units"  rule,  did 
not  qualify.  In  ziddition,  they  could  bill  Medic- 
aid for  some  clients'  services  by  calling  the 
care  "outpatient"  (Medicaid  reimbursable)  in- 
stead of  or  in  addition  to  "social  rehabilita- 
tion" (Medicaid  non-reimbursable).  Overall, 
the  problem  of  data  manipulation  is  well  sum- 
marized in  the  words  of  one  county  adminis- 
trator who  remarked  that  although  the  State 
might  believe  it  "is  getting  the  results  it  wants 
and  getting  us  in  line,  it  may  just  be  the 
numbers." 

Changes  in  Service  Delivery 

Some  changes  in  actual  service  delivery  do 
occur.  Four  programs  reported  changes  only  in 
service  delivery.  Two  more  changed  primauily 
in  service  delivery,  and  an  additional  seven 
changed  about  equally  in  recordkeeping  and 
service. 

For  the  four  that  changed  only  in  service 
delivery,  two  reported  addition  to  or  expansion 
of  Social  Rehabilitation  (SR)  or  Vocational 
Rehabilitation  (VR)  programming.  However, 
one  of  the  two  reported  that  since  this  failed 
to  change  indicators  enough  to  increase  the 
budget  substantially,  funds  were  now  spread 
too  thinly;  and  part  of  the  new  service  would 
have  to  be  dropped.  The  other  expanded  SR 
and  VR  at  the  expense  of  outpatient  and  case 
management,  laying  off  old  staff  and  shifting 
funds  to  the  new  services.  The  third  accented 
aftercare,  taking  funds  from  outpatient  and 
prevention,  while  the  fourth  in  this  group  re- 
ported keeping  clients  in  partial  hospitali- 
zation longer  because  of  that  service's  reve- 
nue-generation capacity.  Similar  action  was 
taken  by  another  program  with  regard  to  com- 
munity residence.  In  order  to  keep  occupation 
rates  up,  one  county  reported  keeping  clients 
in  that  service  beyond  the  point  when  they 
could  have  moved  to  less  restrictive  settings. 

Programs  that  changed  in  both  data  re- 
cording and  services  meide  changes  similar  to 
programs  that  changed  in  services  alone.  Five 
counties  reported  changes  in  program  atten- 
tion to  ziftercare  follow-along.  Four  counties 
made  changes  in  social  and/or  vocational 
programs,  sometimes  at  the  cost  of  other 
programs-  usually  consultation  and  education 
and/or  outpatient.  This  shift  worried  some 
eidministrators,  for  they  saw  in  the  reduction 
of  community  services  and  their  shrinking 
ability  to  provide  services  to  the  middle  class 


120 


the  erosion  of  their  political  base.  In  addition 
to  its  funding  implications,  this  was  viewed  as 
leeiding  away  from  the  democratization  of 
mental  health  services  that  occurred  with  the 
community  mental  health  center  movement.  In 
the  words  of  two  administrators: 

Aftercare  is  up  at  the  cost  of  outpatient, 
but  ultimately  the  cost  is  in  loss  of  po- 
litical support,  which  comes  from  care  to 
the  middle  clziss.  ...  1 look  at  the  suc- 
cess of  MR  over  MH.  It's  because  of  the 
MR  lobby.  . . . Bureaucrats  can't  get 
State  dollars  without  political  clout.  . . . 
Over  the  long  haul,  we'll  miss  the  middle 
class  in  the  sjrstem  because  of  their 
political  power. 

It  was  a shock  when  charity  people  had  to 
be  treated  like  private  patients.  . . . This 
will  disappear.  [We]  won't  go  back  to  25 
years  ago  but  [we  are]  headed  for  two 
classes  of  service. 

These  general  concerns  do  not  identify  the 
exact  nature  or  degree  of  the  actual  changes 
taking  place.  However,  one  can  turn  to  those 
eidministrators  who  provided  details  of  their 
program  changes  for  a better  notion  of  their 
content. 

Where  aftercare  follow-along  was  the 
target  of  change,  reports  included  adding  staff 
and  sending  staff  to  the  State  hospital  to 
become  involved  in  discharge  planning. 
Although  all  counties  see  aftercare 
follow-along  as  necessary  and  see  their  own 
behavior  as  an  aid  to  production  of  positive 
z-scores,  they  typically  also  see  themselves  as 
having  gone  overboard  in  their  response, 
seeing  people  the  day  after  relezise  even 
though  "they  don't  really  need  that"  and 
pursuing  clients  who  refuse  to  be  seen  or  who 
refuse  to  participate  in  "rezisonable" 
treatment  plans.  The  counties  are  not  sure  the 
effort  is  worth  it,  yet,  as  one  interviewee 
reported  graphically: 

If  they  refuse  to  see  you,  you  get  killed  on 
the  score  ...  [so  you]  don't  call  in  ad- 
vance, but  go  to  their  home.  You  have  to 
be  willing  to  trespass  on  behalf  of  the 
Department  of  Welfare.  You  must  lay 
eyes  on  them  while  they  tell  you  to  go  to 
hell 

In  addition  to  resenting  the  staff  time  such 
pursuit  requires,  some  zidministrators  are 
concerned  with  other  issues.  Possible  violation 


of  patient  rights  was  mentioned  by  two,  and 
another  feared  that  individuals  would  be  re- 
tained in  mental  health  treatment,  when  they 
might  better  be  served  by  other  agencies 
following  State  hospital  discharge.  The  latter 
administrator  also  complained  of  lack  of 
credit  for  good  coordination  and  referral  net- 
working to  serve  those  clients.  Another  ad- 
ministrator observed  that: 

As  the  State  accomplishes  its  goals,  the 
county  is  strained.  The  allocation  process 
is  based  upon  an  untrue  assumption  re- 
garding the  priorities  of  keeping  people 
from  going  to  the  State  hospital  emd  pro- 
viding aftercare.  Resources  are  diverted 
to  chronics  so  [they]  can't  service  people 
who  have  not  been  2idmitted.  . . . 

Two  counties  wondered  about  how  they 
could  best  deal  with  the  growing  group  of 
chronics  who  have  never  been  in  a State  hos- 
pital and  therefore  are  counted  as  acute  care 
patients  for  performance  indicator  purposes, 
but  are  in  need  of  chronic  care.  As  use  of 
community  beds  rather  than  State  beds  grows, 
this  problem  is  expected  to  increase.  One  ad- 
ministrator remarked  that  with  short  hospi- 
talizations in  private  beds,  individuals  were 
often  released  who  were  more  in  need  of  im- 
mediate followup  than  those  who  received 
longer  term  treatment  in  State  facilities. 

Where  vocational  rehabilitation  and  social 
rehabilitation  are  concerned,  one  adminis- 
trator who  claimed  not  to  follow  his  own  ad- 
vice said: 

If  you  have  a person  needing  partial 
[hospitalization]  you  could  instead  play 
the  game  and  generate  lots  of  units  by 
running  people  through  social  programs 
[which  are  cheaper  since  they  don't  re- 
quire a psychiatrist's  services]. 

Another  administrator  who  uses  VR  and  SR 
extensively  noted  with  pleasure  that  this  helps 
the  county  score,  but  attributes  the  use  of 
those  services  to  their  suitability  for  clients  in 
community  residences. 

Program  changes  also  have  been  matde  to 
reduce  State  hospital  admissions.  One  county 
placed  a staff  member  at  the  prison  upon 
finding  that  many  prisoners  were  being  ad- 
mitted to  the  State  hospital.  Another  devel- 
oped its  own  local  inpatient  program  to  offer 
an  alternative  to  State  hospital  use.  But  pre- 
vention of  State  hospital  zidmissions  is  not 
without  problems.  In  addition  to  reduced 


121 


ability  to  use  the  aftercare  designation  for 
chronic  individuals,  counties  reported  jealously 
guarding  their  community  beds  for  their  own 
patients  and  lessening  their  cooperation  with 
other  counties  in  facilitating  patient  care.  In 
addition,  one  interviewee  reported  that  hos- 
pitals that  participate  in  community  programs 
serving  Medicaid  and  county-pay  clients  suffer 
in  image  and,  hence,  income.  When  they  ac- 
cept too  many  such  clients,  private  psychia- 
trists cut  down  on  referrals  and  the  hospitals 
are  then  tempted  not  to  renew  county  con- 
tracts or  wish  to  raise  the  cost  of  such  con- 
tracts. The  predicted  outcome  was  a return  to 
increased  State  hospital  use  over  the  long 
term. 

As  attempts  to  generate  revenue  increzise, 
outpatient  services  become  less  distin- 
guishable from  case  management.  Outpatient 
therapists  now  fill  out  Medicaid  applications  in 
one  setting,  while  in  others,  case  managers 
handle  issues  face-to-face  that  were  formerly 
hemdled  by  telephone.  One  cidministrator  re- 
ported seeing  the  changes  as  beneficial  be- 
cause they  forced  outpatient  therapists  to  use 
their  time  more  efficiently.  Others  were  less 
enthusizistic.  In  counties  where  unemplojnnent 
was  high,  attempts  at  revenue  generation  re- 
sulted in  non-Medicaid  eligible  clients  drop- 
ping out  of  outpatient  care.  The  same  was  true 
of  clients  who  formerly  paid  on  a sliding  scale 
and  were  now  being  urged  to  apply  for  Medic- 
aid. As  one  administrator  put  it: 

Medicaid  is  up  as  people  are  being  forced 
to  apply  for  it  to  get  mental  health 
services.  It*s  likely  that  other  welfare  is 
up  also.  This  is  good  as  it  gives  a truer 
picture  of  need,  but  it's  had  as  it  takes 
away  people's  pride,  initiative,  and  so 
on.  . . . 

One  alternative  to  revenue  generation  is 
lowering  costs.  Use  of  SR  and  VR  already  wzis 
mentioned  as  one  way  to  cut  costs.  Another  is 
to  use  less  expensive  staff.  Two  county  ad- 
ministrators noted  making  reduced,  minimal 
use  of  psychiatrists,  although  in  one  case,  lo- 
cal M.D.s  were  preferred  and  available.  In- 
steetd,  for  both,  M.D.s  were  used  only  in  the 
amount  necessary  to  meet  licensing  require- 
ments, while  master's  level  therapists  deliv- 
ered most  services. 

Pareidoxically,  however,  if  a county  puts  no 
State  dollars  into  a program,  it  cannot  report 
units  of  service  delivered  by  that  program. 
Hence,  in  one  case,  service  costs  went  up  in 
that  a paid  professional  became  involved  in 


delivering  a service  formerly  delivered  solely 
by  volunteers. 

Keeping  costs  down  is  important.  When 
treatment  costs  are  high,  there  is  ho  advan- 
tage to  middle  class  individuals  attending 
community  programs.  This  means  that  com- 
munity programs  serve  only  charity  cases  and 
rely  solely  upon  public  funds.  In  eiddition,  notes 
one  administrator,  "it  means  that  there  are 
two  claisses  of  mental  health  care,  each  of 
which  must  pay  administrative  costs."  Hence, 
the  economic  burden  of  mental  health  care  on 
society  as  a whole  is  greater  than  necessary. 

Changes  That  Did  Not  Occur 

Innovation  suffered  according  to  some  ad- 
ministrators. Others  said  innovation  was 
maintained  at  the  cost  of  z-score  perform- 
ance. Some  felt  that  the  distribution  of  dollars 
indicator  discriminates  against  innovation, 
since  start-up  costs  for  new  equipment  and 
administration  of  new  programs  are  relatively 
high  and  revenue  generation  possibilities  low. 
The  adverse  effect  on  z-scores  prevents  the 
budget  increases  needed  to  sustain  new  pro- 
grams. In  addition,  when  new  programs  do  not 
fit  into  old  categories,  there  is  no  way  to  get 
credit  for  them  in  the  budgeting  process  no 
matter  how  efficient  they  are. 

Others  note  that  improvement  of  programs 
that  are  alreeidy  bringing  in  z-scores  above  the 
mean  hjis  cezised.  It  seems  a variant  of  the  "If 
it  ain't  broke,  don't  fix  it"  philosophy.  One 
notes  that  the  system  seems  designed  for 
trziditional  programs. 

Although  most  express  discouragement  at 
the  stifling  of  innovation,  a minority  suggest 
that  changes  may  be  rapid: 

It  seems  to  take  too  long  to  see  results  of 
improvement.  When  we  see  ourselves 
[near  the]  bottom,  the  temptation  is  to 
make  drastic  changes  without  careful 
planning. 

In  discussing  the  frustration  in  responding  to 
the  State's  performance  indicators,  many  in- 
dividuals referred  to  the  State  Office's  chal- 
lenge to  "come  up  with  something  better." 
While  they  endorsed  the  general  notion  of 
performance  indicators,  and  as  already  noted, 
saw  them  as  a definite  improvement  over 
former  allocation  processes,  they  also  sug- 
gested reforms  or  elimination  of  specific 
measures.  Some  of  these  showed  misunder- 
standing of  how  z-scores  work. 

Many  complained  about  State  hospital  ad- 


122 


missions.  One  complained  about  lack  of  county 
control  over  length  of  stay  in  State  facilities; 
another,  that  "the  State  needs  to  know  why 
beds  are  used  before  it  can  decide  it's  bad  to 
use  them."  Most  who  complained  did  so  on  the 
bsisis  of  lack  of  private  beds  and  a tradition  of 
State  hospital  use  in  their  counties  that  takes 
time  to  break.  One  noted  that  the  State  hos- 
pital in  its  area  offers  better  treatment  than 
alternate  sources  of  care. 

The  most  frequent  target  for  suggested  in- 
dicator change  was  in  the  area  of  c<ise  man- 
agement. Said  one  administrator: 

No  factor  mezisures  non-face-to-face 
services  such  as  court  appearances.  [Yet 
these  are]  especially  important  for  chil- 
dren where  we  must  meet  with  teachers 
and  family  to  do  a good  job. 

Another  referred  to  a second  Ceise  manage- 
ment function  for  which  no  credit  was  given: 

When  a case  manager  visits  [another  in- 
stitution] regarding  a client  in  [that  fa- 
cility] the  visit  is  charged  to  the  facility, 
not  to  the  client,  therefore,  it  doesn't  get 
counted  in  performance  indicators.  Yet  it 
takes  time  and  is  necessary  for  adequate 
service. 

A third  defined  the  problem: 

The  State  needs  input  regarding  definition 
and  goals  of  case  management.  Then  it 
can  establish  criteria.  Now  goals  and 
criteria  are  not  defined  and  evaluation  [is 
beised  on]  face-to-face  contact.  We  will 
be  forced  to  put  less  emphzisis  on  treat- 
ment planning  and  monitoring  and  more 
emphasis  on  face-to-face.  . . . 

The  result,  the  administrator  continued,  will 
be  loss  of  integration  among  county  services 
2md  with  referral  sources;  clients  will  fall  be- 
tween the  cracks. 

Some  supporters  of  the  indicators  suggested 
that  the  name  "performance  indicators"  should 
be  changed  to  "allocation  measures".  This 
terminology  was  endorsed  as  more  accurate 
eind  more  acceptable  to  county  commissioners 
and  the  public  when  numerical  declines  must 
be  explained.  This  change  was,  in  fact,  made 
late  in  FY  1983-84.  Some  suggested  that 
rather  than  using  z-scores  that  accent  com- 
petition among  counties,  absolute  standards 
should  be  developed.  These  would  make  visible 
the  improvements  from  year  to  year. 


Those  who  complained  about  State  goals  and 
county  goals  being  divergent  offered  a solution 
that  involved  local  needs  assessment  and 
measurement  of  county  progress  against  goals 
arising  from  that  mezisure.  They  pointed  to  the 
wide  variation  among  counties  in  population, 
socioeconomic  status,  and  availability  of 
services.  Other  relevant  county  character- 
istics include  long  costly  travel  times  in  rural 
counties,  and  the  presence  or  absence  of  State 
hospital  facilities,  private  hospitals,  and 
nursing  home  beds. 

Some  suggested  that  poor  performance  was 
self-perpetuating  since  poor  performance  on 
the  indicators  lowered  the  budget,  which,  in 
turn,  made  remediation  difficult.  They  main- 
tained that  such  counties  need  help  rather 
than  penalties  and  that  performance  indicators 
should  be  used  for  self-zissessment  rather  than 
budgetary  purposes.  They  noted  that  the  large 
State  match,  coupled  with  performance  indi- 
cator use,  gave  counties  responsibility  for 
service  delivery  without  giving  them  program 
control. 


Conclusion 

It  bears  repeating  that  despite  numerous 
criticisms  of  performance  indicator  use  in  the 
Pennsylvania  mental  health  funding  process,  a 
majority  of  county  zidministrators  see  that 
process  as  an  improvement  over  previous 
practice.  In  this  sense,  performance  indicator 
use  is  indeed  an  overwhelming  success,  pro- 
viding a rational  method  for  distribution  of 
resources.  However,  county  qiiarrels  with  the 
specifics  of  the  indicators  and  their  reports  of 
their  efforts  to  produce  statistics  that  will  aid 
them,  reveal  three  difficulties.  The  first  is  the 
divergence  of  State  and  county  priorities.  De- 
spite the  impression  in  some  counties  that  the 
State  has  failed  to  articulate  goals  and  phi- 
losophies, it  is  clear  to  most,  and  indeed 
stated  explicitly  by  State  officials,  that  the 
State's  primary  objective  is  depopulation  of 
State  hospitals.  Concommitant  with  that  is  the 
need  to  assure  that  service  to  chronic  and 
severely  impaired  patients  is  available  else- 
where. These  goals  are  often  at  ocUls  with 
county  goals,  especially  in  the  less  populated 
areas  where  few  services  are  available  and 
where  economical  provision  of  a range  of 
services  is  difficult.  They  are  also  at  odds  with 
what  many  counties  see  as  their  first  respon- 
sibility, primary  and  secondary  prevention,  and 
with  their  political  bzise. 

The  second  difficulty  lies  in  the  production 


123 


of  indicators.  As  Blau's  (1955)  caise  study, 
other  literature  presented  in  Ginsberg  (this 
volume)  and  Campbell's  (1979)  theory  suggest, 
use  of  quantitative  indicators  for  decision- 
making has  a corrupting  effect  upon  those 
indicators.  The  Commonwealth  of  Pennsyl- 
vania, despite  the  care  with  which  its  allo- 
cation indicators  for  county  mental  health 
programs  are  selected,  defined,  and  imple- 
mented, is  no  exception.  Double  counting,  use 
of  a single  program  to  produce  partial  hos- 
pitalization, social  rehabilitation  and  voca- 
tional rehabilitation  units,  confusion  between 
outpatient  and  case  management,  and  the  like 
are  nearly  universal,  lliis  would  not  pose  so 
great  a problem  if  the  manipulation  were  done 
consistently;  but  from  place  to  place  and  year 
to  year,  different  strategies  are  employed- 
some  vdth  the  zidvice  and  counsel  of  regional 
and/or  State  advisors,  some  without.  This  hais 
no  immediate  effect  on  the  allocation  process 
other  than  that  the  more  skillful  and/or  luck- 
ier data  mzmipulators  do  better.  In  the  long 
run,  however,  the  effect  upon  planning  may  be 
devastating. 

The  third  difficulty  is  that  noted  by  Camp- 
bell (1979)  and  illustrated  by  Blau  (1955^^- 
distortion  of  social  process.  TTie  reported  de- 
cline in  cooperation  among  counties  due  to 
guarding  of  local  non- State  hospital  psychia- 
tric beds,  the  drop  in  interageny  cooperation 
from  pursuit  of  aftercare  clients,  and  the 
short  shrift  for  case  management  are  dys- 
functional side  effects  of  indicator  use  from 
the  viewpoint  of  both  State  and  county.  The 
same  is  true  if,  as  predicted,  middle  class 
support  for  mental  health  declines  and  if  sep- 
arate public  and  private  mental  health  systems 
replace  current  coraity  programs.  Similarly,  if 
counties  do  not  take  financial  responsibility 
for  prevention  or  are  unable  to  make  consul- 
tation and  education  services  self-supporting, 
the  desired  decline  in  State  hospital  population 
will  be  achieved  only  by  treatment  in  county 


funded  and  supervised  beds. 

Having  identified  these  difficulties,  the 
question  remains  as  to  how  seriously  one  ought 
to  regard  them.  That  question  is  Izu'gely  un- 
answered by  the  present  rese2u*ch.  The  cl2ish  of 
ideologies  does  not  unduly  concern  this  re- 
searcher--it  is  in  the  open  and  can  be  the 
subject  for  negotiation  between  State  and 
county  and  among  citizens  aind  legislators.  It  is 
the  other  two  difficulties  that  contain  the 
greater  danger.  And  it  is  in  these  difficulties 
that  the  need  for  further  research  lies.  To 
what  extent  are  figures  being  manipulated?  Do 
the  manipulations  cancel  each  ottier,  or  are 
they  in  a uniform  direction?  If  in  a uniform 
direction,  what  are  the  implications?  Is  case 
management  declining  or  simply  being  called 
outpatient?  If  declining,  what  are  the  con- 
sequences for  clients?  Has  the  hospitalization 
rate  gone  down  with  provision  of  more  varied 
county  services- or  only  the  rate  in  State 
hospitals?  These  questions  are  not  easily  an- 
swered. Interviews  with  county  zidministrators 
give  a notion  of  where  to  look  for  changes  in 
mental  health  care  delivery  as  a result  of 
performance  indicator  use,  but  the  actual  in- 
vestigation must  take  place  at  the  level  of 
service  delivery.  This  remains  to  be  done. 


References 

Blau,  P.M.  The  Dynamics  of  Bureaucracy. 
Chicago,  111.:  University  of  Chicago  Press, 
1955. 

Campbell,  D.T.  Assessing  the  impact  of 
planned  social  change.  Evaluation  and  Pro- 
gram Planning  2:67-90,  1979. 

Hadley,  T.R.;  Wilcox,  J.T.;  Rossman,  G.R.;  and 
Nazar,  K.  Performance  standards  and  al- 
location of  funds  in  community  mental 
health  programs— the  Pennsylvania  system. 
Adminstration  in  Mental  Health  10:155-161, 
1983. 


124 


A Case  Summary 

The  Department  of  Health,  Education,  and  Welfare’s  Development 
and  Use  of  Certification  Criteria* 


The  Department  of  Health,  Education,  and 
Welfare’s  (HEW)  Division  of  Staff  Resource 
Analysis,  Office  of  the  Secretary,  developed 
and  implemented,  on  a limited  basis,  certi- 
fication criteria  for  evaluating  staff  manage- 
ment systems.  Specifically,  the  criteria  were 
to  determine  whether  a system’s  approach  was 
feasible,  how  comprehensive  it  should  be,  the 
amounts  of  documentation  that  would  be  re- 
quired, and  how  the  system  was  used  in  man- 
aging staff  resources.  The  criteria  placed  an 
empheisis  on  the  use  of  work  measurement 
standards.  In  1977,  however,  the  Division  of 
Staff  Resource  Analysis’  use  of  the  certi- 
fication criteria  ended.  The  termination  of  the 
certification  reviews  Weis  attributed  generally 
to  a lack  of  high-level  support  within  the 
Department. 

Background  on  Development  of  HEW’s 
Certification  Criteria 

HEW  initiated  several  staff  management 
systems  during  the  early  1970s.  The  impetus 
for  developing  the  first  of  these  systems,  the 
Manpower  Utilization  Program,  was  a 1971 
report  by  the  House  Appropriations  Com- 
mittee on  HEW’s  manpower  management 
policies  and  practices,  which  cited  many  de- 
ficiencies in  HEW  personnel  practices,  in- 
cluding a lack  of  continuing  reviews  of  oper- 
ations and  personnel  requirements  and  a lack 
of  personnel  measurement  systems.  The  Man- 
power Utilization  Program,  later  designated 
the  Manpower  Management  Program  and  then 
the  Staff  Resource  Management  Program,  was 
initiated  to  improve  staff  use  through  a De- 
partment-wide program  of  work  measurement 
systems  and  staff  utilization  surveys.  The  re- 
sponsibility for  developing  and  implementing 


•This  case  sununeiry  is  taken  from  Evaluating  a 
Performance  Measurement  System — A Guide  for 
the  Congress  and  Federal  Agencies,  U.S.  General 
Accounting  Office,  Washington,  D.C.,  May  1980. 


specific  sjrstems  was  assigned  to  each  prin- 
cipal operating  component,  with  guidance  to 
be  provided  by  the  Office  of  the  Secretary. 
Additionally,  the  Department  made  improved 
productivity  through  personnel  management  a 
major  goal.  This  was  an  attempt  to  further 
implement  effective  personnel  management 
S3^tems. 

In  February  1975,  the  House  Appropriations 
Committee  issued  another  report  on  HEW’s 
personnel  management  policies  and  practices. 
While  recognizing  that  HEW  had  taken  steps  to 
improve  its  personnel  management,  this  report 
identified  several  problems,  including  the  lack 
of  a meaningful  relationship  between  the 
personnel  program  and  the  budget  process,  and 
lack  of  interest,  acceptance,  or  cooperation  on 
the  part  of  HEW  agencies  in  establishing  sys- 
tems. The  House  Appropriations  Committee, 
in  its  report  on  HEW’s  fiscal  1977  appropria- 
tions, was  again  critical  of  the  slow  progress 
in  implementing  personnel  management  pro- 
grams. This  report  encouraged  the  Secretary 
of  HEW  to  eissign  this  effort  a high  priority. 

In  October  1976,  to  emphasize  the  import- 
ance of  the  program,  the  Assistant  Secretary, 
Comptroller,  advised  HEW  agency  heaxis  that 
additional  staff  would  not  be  approved  in  the 
1978  budget  unless  supported  by  a rezisonably 
well  developed  personnel  management  system. 
Furthermore,  acceptance  of  any  system  was 
conditioned  upon  certification  by  the  Division 
of  Staff  Resource  Analysis.  In  early  1977,  the 
certification  process  received  added  empheisis 
through  the  reeissignment  of  staff  resource 
management  specialists  to  the  budget  office. 
This  reorganization  was  an  attempt  to  more 
closely  tie  work  measurement  to  the  budget 
process. 

Components  and  Use  of  HEW’s 
Certification  Criteria 

The  HEW  certification  criteria  were  de- 
veloped to  approve,  or  certify,  operational 


125 


staff  resource  management  systems  by  de- 
fining and  comparing  elements  of  an  adequate 
system,  including  work  meaisurement.  Because 
different  work  measurement  techniques  were 
used  throughout  HEW,  there  were  variations  in 
the  methods  used,  the  accuracy  of  the  data, 
the  comprehensiveness  of  systems,  and  the 
degree  of  application,  thus  making  it  imprac- 
tical to  develop  specific  quantitative  criteria. 
As  a result,  bro^  criteria  were  developed. 
These  criteria  consisted  of  a checklist  of  es- 
sential program  components  and  addressed  the 
following  areas: 

• Work  units  (outputs) 

• Work  count  system 

• Time  values  (standard  time,  average  time) 

• Staffing  budgets 

• Reallocations 

• Staff  resource  management  systems 
maintenance 

• Application  of  the  s5rstem 

Reviewers  in  the  Division  of  Staff  Resource 
Analysis  evaluated  staff  resource  management 
systems  using  a detailed  certification  criteria 
checklist,  and  a certification  status-  cer- 
tified, not  certified,  or  provisionally  cer- 
tified—was  zissigned.  Provisional  certifica- 
tion meant  that  a number  of  improvements 
had  to  be  implemented  before  a system  could 
be  accorded  full  certified  status. 

The  Division  of  Staff  Resource  Analysis 
conducted  only  a limited  number  of  certifi- 
cation reviews  during  1977,  most  of  which 
were  in  the  Public  Health  Service  (PHS).  These 
reviews  pointed  out  deficiencies  with  the 
system  and  recommended  arezis  for  improve- 
ment. For  example,  a June  1977  review  of 
PHS's  Center  for  Disezise  Control  resulted  in 
a provisional  certification  for  most  systems 
and  noted  the  following  deficiencies:  (1)  sys- 
tem objectives  were  incomplete;  (2)  work  units 
were  not  defined;  (3)  there  was  no  evidence  of 
workload  forecasting  methods,  documentation. 


jind  accuracy  of  past,  current,  or  future  bud- 
get submissions;  and  (4)  no  evidence  existed  to 
show  any  use  of  staff  resource  management 
systems  after  budget  preparation  or  any  sys- 
tem documentation  for  ongoing  control  or  al- 
location of  staff  positions.  Improvements 
would  have  had  to  be  made  in  each  of  the 
above  areas  to  warrant  a full  certification. 
The  former  Director,  Division  of  Staff  Re- 
source Analysis,  stated  that  only  one  organi- 
zation's sjrstem-  that  of  the  Bureau  of  Hearing 
and  Appeals,  Social  Security  Ad- 
ministration-received a full  certification. 


Termination  of  Certification  Reviews 

HEW's  Division  of  Staff  Resource  Analysis 
has  not  performed  any  certification  reviews 
since  1977.  Department  officials  responsible 
for  developing  and  implementing  the  criteria 
stated  there  was  a general  lack  of  high-level 
support  for  the  certification  reviews.  This  lack 
of  support  was  further  substantiated  by  a 1977 
HEW  Audit  Agency  report,  which  noted  that 
no  HEW  principal  operating  components  had 
developed  certifiable  personnel  management 
systems.  The  report  stated  that  the  major 
factor  contributing  to  this  situation  was  in- 
sufficient management  interest. 

The  Public  Health  Service,  however,  con- 
tinues to  place  a major  emphasis  on  the  use  of 
work  measurement  and  resource  utilization 
system  data  in  its  budget  justification.  It  is 
the  only  principal  operating  component  that 
still  uses  the  HEW  criteria  for  certifying  its 
sjrstems.  Generally,  PHS  will  not  support  re- 
quests for  personnel  increzises  in  organizations 
with  noncertifiable  S3rstems.  The  HEW  cri- 
teria, which  define  adequate  systems,  have 
proven  helpful  to  PHS  in  its  development  2ind 
maintenance  of  work  measurement  and  re- 
source utilization  systems.  PHS  officials 
stated  that  the  Department,  however,  has  not 
supported  them  in  their  use  of  certification 
reviews. 

We  believe  that  without  high-level  Depart- 
ment support  and  evaluation  criteria,  HEW  and 
other  Federal  agencies  are  unable  to  fully 
assess  the  adequacy  and  utility  of  their  meas- 
urement systems. 


126 


A Case  Summary 

Performance  Information  in  the  Atlanta  School  Systemi 


Bayla  F.  White 
The  Urbein  Institute^ 


Urban  Institute  staff  had  observed  that 
school  systems  seldom  have  systematic  ways 
to  identify  and  therefore  to  learn  from  their 
educational  successes  and  failures.  The  Urban 
Institute  researchers  believed  that  developing 
and  making  such  information  available  would 
lead  administrators  to  operate  more  ration- 
ally. Yet  1 year  after  having  developed  and 
distributed  performance  information  in  the 
Atlanta  school  system,  they  found  little  evi- 
dence of  its  use  in  school  S5rstem  management. 


Project  Initiation 

The  Urban  Institute  chose  the  Atlanta 
school  system  to  participate  in  a cooperative 
study  because  that  system  had  sufficient  data 
to  explore  performance  questions.  In  addition, 
its  officials  professed  strong  interest  in  im- 
proving student  performance  and  seemed  re- 
ceptive to  new  ideas.  An  agreement  was  made 
for  a 6-month  initial  study  of  the  feasibility  of 
constructing  a system  of  measures  of  the 
relative  performance  of  schools  and  grades 
within  a school,  since  these  units— rather  than 
the  individual  students-  represent  the  lowest 
units  of  resource  allocation  decisions.  During 
this  period.  The  Urban  Institute  researchers 
became  familiar  with  the  Atlanta  school 
system  and  what  data  were  available.  They 
collected  data  on  a sample  of  schools,  did 
anal3Tses,  and  had  the  Atlanta  school  staff 
review  the  results. 


^This  summary  neirrative  was  extracted  from  B.F. 
White’s  longer  description  in  "The  Atleinta  Project: 
How  One  Large  School  System  Responded  to 
Performemce  Information,"  Policy  Analysis  1(4): 
659-691,  1975.  Copyright  1975  by  the  Regents  of 
the  University  of  California. 

2now  with  the  Office  of  Management  and  Budget. 


The  Performance  Measurement  System 

Discussions  with  Atlanta  officials  indicated 
they  made  staff  and  resource  allocation  de- 
cisions generally  without  benefit  of  any  con- 
sistent information  on  performance.  Questions 
about  what  they  wanted  to  know  about  per- 
formance revealed  that  student  achievement 
was  the  only  indicator  on  which  they  agreed. 
Learning  to  read  and  to  solve  simple  arith- 
metic problems  therefore  were  accepted  as 
criteria  for  elementary  school  students.  The 
Federal  Office  of  Economic  Opportionity 
(OEO)  funded  a proposal  to  design  a technique 
for  comparing  relative  performance  (at  the 
grade  level)  among  all  Atlzmta  elementeuy 
schools  and  then  test  the  effects  of  the  in- 
formation on  selected  school  system 
operations. 

The  approach  adopted  compares  perform- 
ance among  schools  serving  similar  students, 
identifies  significant  Ccises  of  extreme  per- 
formance, and  displays  the  results  as  a series 
of  charts  in  which  red  signals  denote  levels  of 
relatively  low  performance  zind  blue  signals 
denote  levels  of  relatively  high  performance. 
Mean  (average)  achievement  on  the  annual 
standardized  tests  wzis  used  as  the  measure  of 
performance;  schools  were  identified  as  sim- 
ilar on  the  basis  of  the  level  of  their  student 
participation  in  the  free  and  reduced-price 
lunch  program.  Since  family  size  and  income 
(self-declared)  determined  eligibility  for  the 
free  and  reduced-price  lunch  program,  the 
percentage  of  students  who  participated  in 
this  program  provided  an  indicator  of  the 
percentage  of  poor  students  at  each  school. 
(This  variable  alone  accounted  for  50  to  80 
percent  of  the  variation  in  average  scores 
within  each  grade  level.) 

Scores  on  the  reading  and  arithmetic  sub- 
tests of  the  achievement  tests  administered 


127 


each  April  were  used.  Signals  of  relative  per- 
formance for  each  grade  were  calculated  for 
schools  similar  in  levels  of  poverty.  About  10 
to  15  percent  of  the  grades  at  each  level  were 
selected  as  extreme.  These  extremely  high  or 
low  performance  levels  were  further  differ- 
entiated by  coloring  the  entire  signal  for  the 
most  extreme  levels  and  coloring  only  half  the 
signal  for  the  less  extreme  levels.  When  the 
performance  was  not  extreme,  the  signal  was 
not  colored.  These  signals  are  shown  in  figure 
1.  (Red  signals  appear  here  as  solid  black  and 
blue  signals  as  striped.)  This  format  lets 
viewers  see  at  a glance  the  relative  perform- 
ance at  a school. 


Use  of  Signals 

Signal  booklets  were  distributed  within  the 
Atlanta  school  system  in  November  1972, 
providing  data  for  2 years  to  each  school,  as 
shown  in  figure  1.  Each  of  the  five  geographic 
districts  received  another  display  of  all  signals 
for  every  school  in  that  district  for  1 year. 

The  distribution  of  signal  booklets  to  prin- 
cipals and  area  resource  teachers  began  with  a 


Figure  1.  Signals  distributed  in  November  1972 


Atlanta  Public  Schools 

Metropolitan  Achievement  Tests  (MAT) 
Spring  Signals 


Reading 


Arithmetic 
Problem  solving 


Grade  ’71  ’72 


d o 

nn 

d 

o 

o 

o 

d 

o 

o 

o 

o 

o 

Grade  ’71  ’72 


d 

o 

o 

d 

d 

o 

o 

0 

D. 

o 

o 

o 

0 

Source:  White.  B.F.  Copyright  1975  by 
The  Regents  of  the  University  of  California. 
Excerpted  from  Policy  Analysis  1(4):659-691, 
Fall  1975,  by  permission  of  the  Regents. 


series  of  briefings  orgamized  by  each  zirea 
superintendent.  The  booklets  were  explained 
and  distributed  to  instruction  division  and 
personnel  division  staff  in  December  1972  and 
early  Jamuary  1973.  Classroom  teachers  never 
received  the  booklets  directly,  as  the  signals 
were  viewed  primarily  as  a management  tool 
for  use  by  hi^er  school  officials.  The  extent 
to  which  signals  were  made  available  to 
classroom  teachers  wais  left  to  the  discretion 
of  each  principal. 

From  principals  on  up  through  the  ranks, 
Atlanta  aidministrators  liked  the  signal  book- 
lets and  praised  their  form  and  mamner  of 
presentation.  Atlanta  staff  at  all  levels  en- 
dorsed the  principle  of  comparing  schools 
serving  similar  student  populations.  Especially 
appealing  to  some  principals  was  the  idea  of 
comparing  Atlanta  schools  with  one  another 
rather  than  with  the  national  norm  established 
by  the  achievement  test  mamufacturers. 

Few  principals  expressed  surprise  at  the 
signals  for  their  schools,  a reaction  that  lends 
operational  support  to  the  statistical  evidence 
of  the  signals’  validity.  The  principals  gener- 
ally attributed  a high  (bliae)  signal  in  a grade 
to  the  success  of  a teacher  or  teachers  and  a 
low  (red)  signal  to  student  discipline  problems, 
student  transiency,  overcrowding,  or  the 
absence  and/or  inexperience  of  pau^ticular 
teachers.  They  found  it  more  difficult  to  ac- 
count for  situations  in  which  math  and  reading 
signals  differed  for  the  same  grade. 

Resource  teachers  and  area  superintendents 
had  more  difficulty  in  interpreting  signals  for 
specific  schools  and  grzides,  because  the  im- 
pressions on  which  they  based  their  judgments 
of  performance  were  not  equally  current  for 
all  schools.  They  usually  explained  a hi^  or 
low  signal  in  terms  of  the  quality  of  teaching 
in  the  greide  or  the  overall  quality  of  the  fac- 
ulty at  the  school.  Other  explemations  for 
signals- -either  high  or  low— included  curri- 
culum changes,  test  zuiministration,  teachers' 
attitudes,  and  the  presence  of  unusual  students 
at  the  school. 


The  Search  for  Impact 

To  eissess  the  impact  of  information  about 
relative  school  performance,  three  important 
activities  of  the  school  system  that  might  be 
used  to  improve  student  performzince  in  the 
classroom  were  examined:  (1)  the  recruitment, 
assignment,  and  reassignment  of  staff,  (2)  the 
design  of  the  instructional  program  and  the 
provision  of  instructional  material,  and  (3) 


128 


efforts  to  improve  clzissroom  teachers’  skills. 
The  issue  in  each  case  was  whether  signal  in- 
formation would  affect  decisions. 

The  Urban  Institute  staff  established  how 
these  functions  normally  were  carried  out  in 
the  Atlanta  school  s3rstem  and  then  monitored 
them  during  the  1972-73  school  year  to  see 
impacts  from  the  signal  information. 

Impact  on  the  Recruitment,  Assignment, 
and  Reassignment  of  Staff 

From  the  beginning  of  the  project,  Atlanta 
officials  maintained  that  the  cleissroom 
teacher  W2is  the  key  to  student  performance. 
Area  superintendents,  resource  teachers,  and 
principals  alike  often  attributed  the  presence 
of  a high  or  a low  signal  to  the  quality  and 
attitudes  of  the  teaching  staff.  Yet  neither 
the  assignment  of  individual  teachers  nor  the 
staffing  of  schools  wzis  substantially  affected 
by  the  introduction  of  information  about  rel- 
ative performzince. 

At  the  time  the  signals  were  explained  and 
distributed,  personnel  division  staff  expressed 
interest  in  using  them  to  identify  character- 
istics of  teachers  associated  with  high-  or 
low-performing  grades.  They  wanted  to  de- 
termine if  training,  years  of  experience,  t3^e 
of  certification,  or  teacher  turnover  appeared 
to  affect  the  level  of  performance  among 
grades  in  similar  schools.  In  effect,  they  saw 
the  signal  information  2is  an  output  mezisure  to 
be  used  in  developing  a screening  device  for 
new  teachers  or  in  making  teacher  gissign- 
ments.  Yet  there  was  no  evidence  that  anyone 
followed  through  on  this  interest. 

According  to  the  assistant  superintendent 
for  personnel,  signals  played  no  part  in  the 
central  decisions  about  teacher  reassignment 
made  during  the  last  months  of  the  1972-73 
school  year.  Area  superintendents  continued  to 
transfer  teachers  between  schools  without 
regard  either  to  the  performance  of  the 
students  in  the  home  school  location  or  to  the 
performance  of  students  in  the  new  school 
assignment. 

During  the  1972-73  school  year,  decisions 
about  staffing  were  governed  by  three  factors: 
decreasing  student  enrollment,  a midyear 
cutback  in  Federal  funds,  and  the  existing 
court  order  on  teacher  desegregation.  De- 
creasing student  enrollment  hzid  an  obvious 
impact  on  staffing.  No  new  teachers,  except 
specied  education  teachers,  were  hired  be- 
tween January  zmd  August  of  1973;  in  fact, 
Atlanta  developed  a huge  backlog  of  appli- 
cants for  teacMng  positions.  The  closing  of  12 


elementary  schools  and  the  conversion  of  2 
high  schools  into  middle  schools  meant  that 
teachers  from  those  schools  hzid  to  be  placed 
elsewhere.  The  midyear  cutback  in  Federal 
funds  meant  that  teachers  hired  under  annual 
contracts  had  to  be  absorbed  by  the  system. 
Atlanta  depended  upon  the  natural  attrition  of 
staff  as  a result  of  retirements  and  resigna- 
tions to  make  the  situation  somewhat  less 
desperate.  Finally,  the  terms  of  the  existing 
school  desegregation  order  meant  that  all 
staffing  decisions  had  to  be  dictated  first  by 
the  requirements  of  the  court-ordered  racial 
ratios  for  individual  schools. 

Impact  on  the  Instructional  Program 

Since  the  development  of  curriculum  for  the 
entire  school  system  and  the  demonstration  of 
new  instructional  approaches  are  usually  mul- 
tiyear activities,  it  might  have  been  predicted 
that  the  signal  information  would  have  little 
or  no  impact  on  decisions  relating  to  the  in- 
structional program  in  the  short  period  of  this 
project.  At  best,  the  instruction  division  staff 
might  have  been  expected  to  use  the  signals  in 
investigating  the  characteristics  of  the  in- 
structional program  in  grades  where  perform- 
ance was  either  relatively  high  or  relatively 
low. 

Yet  there  was  no  evidence  to  show  that  this 
had  been  done.  For  example,  instruction  di- 
vision staff  could  cite  no  attempts  to  develop 
empirical  evidence  of  the  success  or  failure  of 
particular  textbook  series.  Although  several 
principals  reported  that  new  textbooks  had 
been  adopted  for  the  coming  year,  signal  in- 
formation played  no  part  in  decisions  about 
which  texts  to  use.  While  30  percent  of  the 
first  grades  in  one  area  received  low  (red) 
signals  in  math  and  40  percent  of  the  third 
grades  in  another  area  received  high  (blue) 
signals  in  math,  neither  area  office  personnel 
nor  curriculum  development  staff  reported 
having  given  any  thought  to  what  might  have 
occurred  in  those  grzides. 

Nor  was  there  any  evidence  that  signals  had 
even  been  considered  in  decisions  involving  the 
future  shape  of  the  instructional  program.  The 
entire  elementary  school  curriculum  had  been 
under  revision  since  1971,  and  as  pieces  of  the 
curriculum  were  developed,  they  were  field 
tested  in  a sample  of  Atlzinta  elementary 
schools.  But  neither  performance  information 
in  general,  nor  the  signals  in  particular,  seem 
to  have  affected  the  appraisal  of  the  curric- 
ulum as  it  was  developed  or  the  choice  of  new 
pilot  locations  for  field  testing. 


129 


Impact  on  Efforts  to  Improve  Skills  of 
Classroom  Teachers 

No  Atlanta  official  reported  using  signal 
information  in  decisions  regarding  teacher 
training.  The  resource  teachers  who  organized 
and  conducted  workshops  could  cite  no  in- 
stances of  signals'  having  been  used  in  de- 
cisions about  the  subjects  to  be  offered  or  the 
location  of  the  workshops.  Neither  principals 
nor  resource  teachers  remembered  suggesting 
that  a teacher  from  a grade  with  relatively 
low  performance  enroll  in  an  in-  service  pro- 
gram or  proficiency  module,  although  they  had 
made  such  suggestions  without  regard  to 
student  performance. 

Data  on  the  activities  of  resource  teachers 
were  examined  to  determine  whether  these 
teachers  tailored  their  assistance  to  classroom 
teachers  according  to  signal  information. 
Analysis  revealed  no  evidence  that  resource 
teachers  singled  out  high-  or  low-performing 
grades  for  special  attention.  They  tended  to 
visit  grzides  and  classrooms  without  regard  to 
performance. 


Why  Were  the  Signals  Ignored? 

At  leeist  two  factors  seem  to  account  for 
Atlanta's  failure  to  use  the  new  performance 
information.  First,  no  change  of  this  sort  can 
come  about  unless  the  larger  political  or  or- 
ganizational milieu  encourages  it.  In  the  per- 
iod when  the  signal  information  was  intro- 
duced to  Atlanta  officials,  the  political  and 
organizational  climate  argued  forcefully 
against  change.  Second,  any  change  in  the 
operations  or  established  patterns  of  a large 
bureaucracy  occurs  slowly— zind,  even  then, 
only  if  administrators  are  strongly  encouraged 
to  make  the  change  and  have  mechanisms  for 
doing  so.  These  conditions,  too,  were  lacking 
in  Atlanta. 

The  political  and  organizational  milieu  in 
Atlanta  during  the  1972-73  school  year  con- 
tained tremendous  uncertainty  as  the  whole 
school  system  awaited  the  decision  of  a Fed- 
eral court  in  a desegregation  suit.  That  deci- 
sion, when  it  finally  came  in  April  1973,  re- 
sulted in  a substantial  number  of  changes  in 
the  school  S3^tem  as  control  pzissed  from 
whites  to  blacks. 

Under  the  terms  of  the  compromise  plan 
finally  accepted  by  the  court,  no  Atlzinta 
school  would  have  less  than  20  percent  black 
enrollment,  although  some  all-black  schools 
would  remain.  The  required  racial  composition 


was  to  be  achieved  through  a combination  of 
techniques,  including  pairing  of  schools,  vol- 
untary student  transfers,  school  closings,  and 
some  busing.  Twenty-seven  schools  would  re- 
quire a reassignment  of  teachers  in  order  to 
alter  the  racial  composition  of  the  faculty. 
Over  two-  thirds  of  Atlanta's  schools  had  been 
affected  by  the  compromise  plan  by  the  time 
school  opened  in  September  1973. 

The  compromise  plan  did  not  stop  with  the 
transfer  of  students  and  staff.  A major  re- 
quirement was  that  at  leaist  50  percent  of  the 
top  administrative  positions,  including  that  of 
superintendent,  be  filled  by  blacks.  The  pro- 
cess of  selecting  a new  superintendent, 
coupled  with  the  potential  expansion  and  re- 
organization of  the  top-level  administrative 
staff,  created  an  atmosphere  of  uncertainty 
among  existing  administrative  personnel, 
which  resulted  in  a general  reluctance  to  make 
decisions  or  to  take  actions. 

Quite  independent  of  the  desegregation 
plan,  a number  of  other  administrative 
changes  occurred  during  the  year.  For  several 
of  the  preceding  years,  none  of  the  top  13  ex- 
ecutive positions  in  the  school  system  had 
changed  hands.  Yet,  within  the  12-month 
period  beginning  in  August  1972,  4 of  the  ex- 
ecutive staff  besides  the  superintendent— 3 
area  superintendents  and  the  assistant  super- 
intendent for  personnel— were  replaced. 
Moreover,  a new  city  charter  completely 
changed  the  method  of  election  to  the  school 
board,  so  that  control  passed  from  a white  to  a 
black  majority.  On  top  of  that,  declining  en- 
rollment necessitated  the  closing  of  a dozen 
schools  and  the  resulting  transfer  of  students 
and  staff.  And  a drastic  midyear  cut  in  Fed- 
eral funds  for  programs  under  title  IV- A of  the 
Social  Security  Act  forced  the  school  system 
to  terminate  certain  programs  and  find  other 
employment  for  staff  hired  under  einnual  con- 
tracts. Faced  with  all  of  these  pressures,  the 
school  system  largely  marked  time  during  the 
year  that  signal  information  was  introduced 
into  Atlanta. 

But  political  flux  alone  is  not  a sufficient 
explanation  for  the  failure  to  use  performance 
information.  The  Atlanta  Project  has  shown 
that  information  on  performemce  will  not  by 
itself  alter  the  decisions  and  actions  of  school 
officials.  Administrators  who  are  presented 
with  the  information  must  be  shown  its  ad- 
vantages and  encouraged  to  abandon  their  old 
practices  for  new  ones.  Such  instruction  and 
encouragement  were  minimal  in  Atlanta:  The 
ground  rules  set  by  OEO  for  this  project  pre- 
cluded The  Urbain  Institute  staff  from  devoting 


130 


any  substantial  effort  to  showing  Atlanta  of- 
ficials how  to  use  the  signals.  Even  more 
critical,  Atlanta  lacked  well-defined  manage- 
ment mechanisms  that  would  enable  school 
officials  (either  line  or  staff)  to  make  use  of 
them. 

In  other  words,  the  signals  of  relative  per- 
formance provide  a way  for  school  officials  to 
identify  examples  of  the  relative  failure  and 
the  relative  success  of  efforts  to  teach  basic 
skills;  this  represents  only  a first  step,  how- 
ever, in  a logical  sequence  of  activities  that, 
taken  together,  should  improve  educational 
management  and  quality.  Once  officials  can 
reliably  zmd  accurately  diagnose  educational 
problems,  they  must  learn  to  design  and  apply 
educational  treatments  systematically,  to  as- 
sess carefully  the  success  or  failure  of  the 
treatment  applied,  and  leist,  but  by  no  means 
least,  to  use  the  information  from  those  as- 


sessments to  redesign  the  treatment.  Atlanta, 
like  most  school  systems,  had  never  dealt  with 
educational  performance  in  such  a systematic 
way.  Changes  were  constantly  being  made  in 
the  school  system,  but  often  without  any  ap- 
parent rationale,  without  any  careful  followup, 
and  without  regard  to  their  effect  on 
performance. 

What  will  cause  this  situation  to  change? 
There  is  no  simple  answer,  either  for  Atlanta 
or  for  large  school  sjAstems  in  general.  The 
experience  in  Atlanta  seems  to  indicate  that  it 
will  take  time  and  demonstrations  of  how  to 
use  performance  information  before  deci- 
sionmaking patterns  can  be  altered  and  per- 
formance improved.  How  much  time,  how  best 
to  demonstrate  the  use,  and  who  should  do  the 
demonstrating  are  questions  for  further 
exploration. 


131 


Concluding  Remarks 

The  Performance  of  Performance  Measurement 


Part  I of  this  book  presented  a model  for 
implementing  performance  measurement  that 
comprises  four  elements  (goals,  performance 
indicators,  data  systems,  and  systems  for  use 
of  measures)  and  four  levels  of  functioning 
(research  and  development,  system  design, 
commitment  generation,  and  operations).  In 
addition  to  guiding  implementation  of  a per- 
formcmce  measurement  sj^tem,  this  model 
provides  a basis  for  diagnosing  problems  in  the 
functioning  of  any  given  performance  meeis- 
urement  system.  If  the  model  is  used  to  exam- 
ine a number  of  systems,  the  adequacy  of  the 
general  performance  measurement  approach 
to  program  management  can  be  evaluated.  To 
use  the  model  for  evaluation,  one  should  an- 
swer the  questions  in  the  16  cells  of  the  ma- 
trix shown  in  table  1. 


Use  of  the  Matrix 

Table  2 illustrates  the  use  of  this  matrix 
with  suggested  answers  to  the  16  questions 
listed  in  table  1 for  the  community  mental 
health  centers  (CMHCs)  performance  meas- 
urement system  developed  as  part  of  the  Na- 
tional Institute  of  Mental  Headth  (NIMH)  Op- 
erations Management  System  (OMS).  This 
performance  meeisurement  system  came  into 
being  15  years  after  the  CMHC  program  began 
and  11  years  after  NIMH  developed  a cate- 
gorical program  evaluation  effort  focused  on 
the  CMHC  program. 

The  first  action  lezidir  g toward  performance 
measurement  in  the  CMHC  program  was  a 
contract  given  to  The  Urban  Institute  to  assess 
NIMH  evaluation  activities  in  order  to  recom- 
mend improvements  in  the  agency’s  manage- 
ment of  such  activities.  After  critiquing  and 
finding  many  deficiencies  in  the  administra- 
tion, design,  and  subsequent  use  of  past  eval- 
uation studies  in  NIMH  (Weidman  et  al.  1973), 
The  Urban  Institute  concluded  that  the  CMHC 
program,  as  it  was  then  defined  by  manage- 
ment and  program  documentation,  lacked  the 


meaisurable  objectives  and  specified  causal 
links  necessary  to  be  evaluated,  except  in  the 
area  of  economic  viability  (Horst  et  al.  1974; 
Wholey  et  al.  1975).  As  part  of  its  contract  for 
NIMH,  The  Urban  Institute  developed  an 
evaluation  system  in  which  program  models, 
decision  analyses,  and  estimates  of  the  value 
and  cost  of  evaluative  information  to  program 
management  would  underlie  the  selection  of 
program-relevant  evaluations  (Weidman  et  al. 
1973). 

This  experience  seems  to  have  played  a psut 
in  developing  the  concept  of  evaluability  as- 
sessment (Horst  et  al.  1974;  Wholey  et  2il. 
1975),  which  became  a major  focus  of  the 
program  evaluations  of  the  U.S.  Department 
of  Health  and  Human  Services  (DHHS)  in  fiscal 
year  1980  after  Wholey  and  his  associate 
Schmidt  from  The  Urban  Institute  moved  into 
key  positions  in  managing  program  evaluation 
in  the  Office  of  the  Secretary.  Although  The 
Urban  Institute's  recommended  procedures  to 
help  NIMH  management  create  an  evaluable 
program  had  been  abandoned  by  NIMH  after  a 
brief  attempt  at  implementation,  this  ap- 
proach was  reapplied  throughout  DHHS  ad’ter 
Wholey  and  Schmidt  took  charge  of  DHHS 
program  evaluations.  The  Department’s  in- 
struction to  agencies  within  the  Department 
concerning  the  use  of  fiscal  year  1980  1 per- 
cent set-ziside  funds  for  program  evaluation 
specified  the  desirability  of  evaluability  as- 
sessments as  an  initial  form  of  evaduation  and 
exempted  this  form  of  study  from  Department 
echelon  approval,  a considerable  incentive  as 
approval  frequently  took  a long  time  amd  was 
never  certain.  Further,  staff  members  of  the 
Office  of  the  Secretary  themselves  did  an 
evaluability  assessment  of  the  CMHC  program 
(Jewell  et  al.  1980). 

This  initial  step  was  followed  by  a contract 
to  develop  measurable  indicators  for  the  pre- 
viously identified  objectives  (Granville  Cor- 
poration 1980).  This  contract  was  in  effect 
when  the  Department’s  Operations  Manage- 
ment System  was  initiated  in  1979  (Harris 


132 


1979).  The  contract  effort  to  make  the  CMHC 
program  evaluable  was  oriented  to  supporting 
NIMH  efforts  to  find  meaningful  meeisures  of 
CMHC  performance  that  could  be  collected  on 
a regular  basis  from  all  CMHCs,  A joint 
planning  effort  between  CMHC  program  man- 
agers, NIMH  biometricians,  and  consultants 
from  centers.  States,  and  universities  led  to 
the  selection  of  13  indicators  of  CMHC  func- 
tioning toward  the  goals  of  service  accessi- 
bility, efficiency,  and  financial  viability 
(Granville  Corporation  1980;  NIMH  1981). 

Field  trips  were  taken  and  communications 
sent  to  involve  and  inform  Federal  Regional 
Offices,  CMHCs,  and  States  concerning  the 
design  and  implementation  of  the  performance 
measurement  system.  A special  inventory  was 
developed  and  used  to  collect  information  for 

1980  from  all  CMHCs,  and  a manual  to  guide 
centers  in  interpreting  measures  was  distri- 
buted. When  the  reported  data  had  been  pro- 
cessed, comparative  information  about  each 
center's  relative  standing  on  each  of  the  OMS 
me2isures  was  sent  to  all  centers.  States,  and 
Regional  Offices. 

As  this  performance  measurement  proce- 
dure W21S  being  developed,  the  general  ap- 
proach of  measuring  performeince  received 
support  from  passage  of  the  Mental  Health 
Systems  Act  (Public  Law  96-398)  in  October 
1980,  since  this  legislation  proposed  that  per- 
formance measures  should  be  developed  and 
used  to  monitor  grants  for  a variety  of  mental 
health  services. 

The  support  this  act  gave  to  performance 
measurement  was  short-lived,  however.  Both 
the  Systems  Act  and  the  major  Federal  Gov- 
ernment use  of  OMS  performance  data  on 
CMHCs  were  displaced  by  the  Reagan  Admin- 
istration's block  grant  program  (Public  Law 
97-35)  passed  in  August  1981. 

This  brief  history  is  meant  to  give  readers 
sufficient  understanding  of  the  CMHC  per- 
formance meeisurement  system  to  interpret 
the  abbreviated  answers  displayed  in  table  2. 
The  answers  shown  were  based  on  information 
obtained  in  spring  1981  interviews  with  NIMH 
staff  involved  with  NlMH's  OMS  effort  for 
CMHCs  in  1980-81.  The  status  of  the  per- 
formance measurement  system  in  the  spring  of 

1981  was  provisional  and  in  field  testing;  that 
is,  a deliberately  incomplete  and  admittedly 
imperfect  set  of  measures  was  being  made  on 
a first-time  beisis  and  without  specification  of 
relationships  between  the  meeisures  and  over- 
sight decisions. 

The  evaluation  of  the  system  as  only  pro- 


visional and  not  fully  operational  seems  ap- 
propriate for  both  technical  and  political 
rezisons.  Technically,  the  measures  hzid  not 
been  validated,  and  the  accuracy  of  self- 
reported  data  under  conditions  of  government 
monitoring  is  doubtful.  Politically,  the  Federal 
role  of  monitoring  the  CMHC  program  seemed 
likely  to  be  (and  subsequently  was)  greatly 
reshaped.  This  made  the  Federal  effort  to  de- 
velop performance  measures  no  longer  an  ef- 
fort to  increzise  its  own  administration  of  a 
program  but  simply  a possible  form  of  tech- 
nical assistance  to  States  such  as  Coloraido 
(Miller  and  Wilson  1981)  or  Pennsylvania 
(Hadley  et  al.  1981)  that  were  interested  in 
using  this  form  of  monitoring. 

Clear  limits  ejcist  to  what  can  be  learned 
from  such  a simple  procedure  as  the  use  of  the 
16- cell  matrix  to  identify  the  progress  of  a 
given  system.  The  procedure  serves  mainly  to 
suggest  large  omissions  in  the  implementation. 
The  1981  progress  report  of  the  NIMH  OMS  for 
CMHCs  reveals  a wide  variety  of  preparatory 
and  developmental  actions  that  could  be 
identified  as  applying  in  all  but  1 of  the  16 
cells  of  the  matrix.  However,  a simple  listing 
of  actions  does  not  make  clear  the  extent, 
quality,  or  adequacy  of  the  actions.  Criteria 
by  which  to  determine  the  adequacy  of  actions 
would  be  hard  to  specify  and  thus  would  be 
likely  to  vary  widely  among  raters. 

Another  problem  with  the  use  of  the  matrix 
format  to  list  actions  is  that,  if  similar 
amounts  of  space  are  provided,  descriptions 
are  likely  to  be  tailored  to  fit  this  space,  thus 
producing  similar  detail  in  each  cell.  Raters 
are  inclined  to  try  to  think  of  something  to 
give  in  each  cell  but  not  to  give  multiple  re- 
sponses in  any  given  cell.  Thus,  the  detail  in 
responses  is  likely  to  be  distorted  from  the 
appropriate  level  simply  by  the  format  of  the 
questions.  Yet  another  problem  with  the  ma- 
trix is  that  it  does  not  attempt  an  integrated 
summary  of  the  overall  state  of  the  system. 

What  may  appear  to  be  a problem  with  this 
appraisal  system  is  that  it  does  not  cover  the 
condition  of  the  program  itself,  on  which  the 
program  performance  measiarement  sj^tem 
clearly  depends.  Thus,  NlMH's  OMS  for 
CMHCs  ended  because  Federal  management  of 
the  program  was  radically  curtailed  by  the 
substitution  of  block  for  categorical  grants. 
Such  conditions  are  clearly  outside  the  domain 
of  the  performance  meeisurement  system.  Un- 
less the  decisions  about  the  program  are  in- 
fluenced by  the  state  of  the  performance 
measurement  system  (not  the  case  with  the 


133 


4 


^ c 

ll 


I 

u 


4-> 

c 

o 

<u 

C t'- 

0 w 

1l 

5 0) 

> t- 

S I 

01  o 

II 


9 w 
^ 15 
•o  C ^ 
w « fc 

Q g*  ^ 
^ 9 «J 

•M  C3  -d 
^ « <u 
^ 


I 

C 0 
o +j  <^* 

•O  «J  w 

C *C  «5 
V a o 
(U  o W) 

1^2 
dS  £ 
S3  “■ 

pd  rt 
.J  •*■’  o 
ti  w ,. 

2 « S3 

SOS 

^ 4->  C 


ll 

(-  O. 

"o  5 

U V 
(fl  > 
0>  o 

g-o 

oi 


T3 

OJ 

0)  M 
t-.  4-> 

n 


O s s 

4)  x:  w 

S ,>'  ^ 

S 

0.0*0 
o *d 

S"3 

o i 

j-j  :> 


‘^•l'3-S 

*J  X5  S 

«$  «S  W ^ 

X3  ^ *d 
S M 1^  •-• 
l>  o d o 


o P-. 

> «5 

> O 
M &0 

O S. 

’O  O 
O O 
O Oi 
O w 
*-•  ^ 
9*  O 

^ 4-> 

3’S 


6 

M 

O 

Q 


e** 
O *0 
X^  o 

*"  .s 

‘S  S 

o ^ 
o .D 

O o 

9 t/1 
C3 
4-> 


0) 

M ... 
a>  O 

fa  *o 

Is 

8l 

2 

e 

•y  ^ o 

I 3 & 


<D 

O ^ 
nS  O 

S3  ^ 

> o 

^ 4-> 

O nJ 

s ^ 


•d 

S 

o .S 

4)  d 

<3  o 

U B 

4)  4) 

O 
O 
d 
w 
Rj 

o 


M 


C** 

•d 

. o 

fa  *■* 

£ g 

s s 

o 15 


&1 


4>  M 

•o  ii 
d o 

l.^ 

X B 


e 

0 

4-> 

V) 

R) 

4-> 

Rj 

•d  cv. 
4}  *d 
X3  15 

c 
53  15 

1 6 

^ o 
^ "o* 

:S.§ 


o 

M 

<D 

<<-i 

O 

4> 


CO 


^ P 

a 2 *d 
o o 
Q.  o 

S*  o 


d 

u 

o 

o. 

O 


134 


135 


OMS  = Operations  Management  System 


CMHC  program),  the  system  should  not  be 
judged  by  events  uncontrollable  by  perform- 
ance measurement  staff. 


Another  Evaluation  Approach 

These  same  problems  also  apply  to  a guide 
for  Evaluating  a Performance  Measurement 
System  developed  by  the  U.S.  General  Ac- 
counting Office  (GAO  1980).  This  guide  takes 
the  form  of  a checklist  of  the  necessary  com- 
ponents of  a performance  measurement  sys- 
tem and  of  whether  the  system  is  being  used 
for  management  and  accountability.  Eighty- 
eight  questions  fall  into  eight  sections  cov- 
ering the  scope  amd  objectives  of  the  orgain- 
ization,  the  scope  of  the  performance  meas- 
urement system,  the  structure  of  the  per- 
formance measurement  system  (requiring 
almost  half  the  checklist  and  dealing  with 
inputs  and  outputs  and  the  orgamization's 
planning,  budgeting,  and  position  management 
processes),  performance  meaisurement  system 
maintenance,  forecasting,  performaince  re- 
porting, and  the  institutionalization  of  the 
system.  This  framework  differs  from  the  16- 
cell  matrix  mainly  by  (1)  being  descriptive 
rather  than  historical  (conditions  are  to  be 
described  but  not  explained  as  the  result  of  a 
process),  (2)  going  into  much  detail  in  the 
management  functions  to  which  performance 
measures  might  relate,  and  (3)  ignoring  the 
need  for  research  and  development  to  estab- 
lish the  validity  of  components  of  the  system, 
as  well  as  the  need  for  securing  the  commit- 
ment of  program  staff  to  the  concept  of  per- 
formance measurement. 

The  GAO  guide  poses  relatively  better 
questions  about  the  potential  uses  of  mezisures 
in  internal  management  but  lacks  a framework 
to  suggest  how  a performance  measurement 
system  should  be  developed  and  installed.  GAO 
seems  to  have  developed  its  system  for  the 
fairly  specific  type  of  situation  of  use  by 
managers  within  a Federal  agency  to  control 
subordinates.  The  matrix  prototype  was  de- 
signed to  help  a government  agency  monitor 
grantees.  In  Uie  prototype  for  external  moni- 
toring, it  is  more  apparent  that  cooperation  by 
those  monitored  is  essential,  and  that  research 
is  needed  to  understand  how  to  establish  and 
maintain  valid  reporting  and  monitoring  sys- 
tems. Although  more  apparent,  it  may  not  be 
more  real,  however,  especially  in  bureaucra- 
cies with  large  proportions  of  professionals 
accustomed  to  autonomy.  If,  as  this  editor 
believes,  the  major  problems  of  performance 


measurement  systems  stem  from  lack  of  ap- 
propriate developmental  and  support-gener- 
ation activities  preceding  installation,  the 
GAO  guide  will  miss  the  critical  reasons  for 
failure  of  performzmce  measurement  systems. 

What  conclusions  about  the  current  feasi- 
bility and  desirability  of  performance  meas- 
urement can  be  drawn  from  reviewing  the 
state  of  the  arts  of  measurement  and  mauiage- 
ment  and  several  recent  caise  studies  of  sys- 
tems? The  art  of  measurement  has  many  lim- 
itations in  its  applications  to  such  complex 
phenomena  as  public  service  programs.  Many 
of  these  problems  stem  from  the  still  rela- 
tively underdeveloped  conditions  of  the  social 
service  technologies  that  underlie  programs.  In 
zuldition,  even  more  severe  limitations  on  what 
measurement  techniques  can  do  are  imposed 
by  the  orientation  and  limitations  of  manage- 
ment. Complicated  meeisurement  techniques 
to  handle  the  complexity  of  programs  are 
likely  to  be  ignored  by  management.  Too  much 
focus  on  measurement  procedures  may  result 
in  distortions  by  ignoring  features  that  are 
difficult  to  anticipate  or  measwe  and  by  con- 
verting the  political  process  of  determining 
the  values  to  be  supported  by  a program  from 
participative  competition  among  interests  into 
closed,  elite- controlled  decisions  based  on 
presumed  technical  considerations.  Awareness 
of  the  likely  dysfunctional  consequences  of 
performance  measures  (Ridgway  1956;  Gins- 
berg 1984)  serves  as  a useful  corrective  to 
excess  optimism. 

To  appraise  how  performance  meaisurement 
systems  have  actually  performed,  each  of  the 
caises  in  part  III  could  be  aissessed  with  the  16- 
cell  matrix.  This  temptation  wais  resisted  for 
several  reaisons:  (1)  Many  caises  were  multi- 
generational,  that  is,  the  systems  evolved 
through  several  stages.  In  the  different  stages, 
somewhat  different  appraisals  would  apply.  (2) 
Since  the  case  descriptions  were  prepared, 
changes  introduced  by  the  Reagan  Adminis- 
tration have  greatly  modified  the  financing 
and  administration  of  many  social  service 
programs  and  deleted  many  of  the  perform- 
ance measurement  systems  managed  by  the 
Federal  Government.  (3)  The  particular  per- 
formance mecisurement  systems  selected  as 
Ceise  studies  were  not  chosen  for  their  rep- 
resentativeness. Thus,  treating  these  partic- 
ular Ceise  studies  as  a sample  of  the  total 
would  be  inappropriate. 

In  spite  of  these  qualifications,  the  matrix 
does  provide  a convenient  way  to  summarize 
conclusions  about  past  performance  measure- 
ment s5^tems.  Shown  in  table  3 is  a srommary 


136 


Table  3.  Usual  history  of  recent  performance  measurement  systems* 


All  components 

Levels  of  functioning 

Specification  of  program  goals;  specification  of  criteria  and  meas- 
ures; establishment  of  a data  system;  and  systems  for  using 
measurements. 

Research  and  development 

Formal  research  seems  extremely  rare  on  any  part  of  performance 
measurement  systems;  there  is  often  developmental  work  to  refine 
components. 

Design 

General  goals,  measures,  and  use  systems  are  \asually  decided  by 
managers  to  implement  a policy  of  accountability,  with  detail 
worked  out  by  technical  staff.  Data  systems  are  usually  modifi- 
cations of  existing  systems. 

C ommitment-  generation 

Some  efforts  to  win  acceptance  and  compliance  from  reporting 
elements  are  usually  made  but  do  not  go  so  far  as  to  share  deci- 
sions about  more  than  minor  details. 

Operations 

Usually  systems  begin  after  a brief  preparation  period.  Relatively 
little  use  is  usually  made  of  measures  by  central  management— and 
less  at  lower  echelons.  Many  systems  go  through  several 
generations  of  modifications  fairly  rapidly  and  expire,  with  this 
career  determined  mainly  by  changes  in  program  management  or 
policy. 

*As  seen  by  the  editor. 


of  the  editor's  impressions  from  the  cases  in 
part  III,  as  well  as  other  selected  cases  (e.g., 
Bureau  of  Community  Health  Services  1978; 
Griffith  1978).  These  systems  have  had 
shortcomings  in  the  amount  of  research  on  any 
of  their  components  and  in  the  amount  of  ac- 
ceptance generated  in  those  who  had  to  report 
information  or  in  the  general  public.  Both  of 
these  tzisks,  of  course,  are  time-consuming, 
ejcpensive,  and  uncertain  of  success.  On  the 
other  hand,  lack  of  information  about  meas- 
ures and  lack  of  acceptance  of  the  S3^tem  by 
participants  may  make  failure  certain. 

The  fact  that  the  operational  climate  is 
often  dominated  by  highly  changeable  admin- 
istrative policy  considerations  rules  against  a 
long-term  effort  in  performance  measure- 
ment. Thus,  an  essential  precondition  for 
making  performance  measurement  successful 
is  frequently  lacking.  The  absence  of  consis- 
tency within  any  one  effort  at  performance 
measurement  means  that  those  who  advocate 
such  S3^tems  will  not  have  a chance  to  learn 
from  their  own  trials  and  errors  but  must  do  so 
from  others'  experiences.  This  book  has  tried 
to  capture  these  vicarious  experiences. 


References 

Bureau  of  Community  Health  Services.  In- 
struction Manual  for  the  BCHS  Common 
Reporting  Requirements.  Rockville,  Md.: 
U.S.  Department  of  Health,  Education,  and 
Welfare,  1978. 

Ginsberg,  P.E.  The  Dysfunctional  Side- 
Effects  of  Quantitative  Indicator  Produc- 
tion: Illustrations  from  Mental  Health  Care. 
Evaluation  and  Program  Planning  7:1-12, 
1984. 

Granville  Corporation.  Development  of  Per- 
formance Indicators  for  the  Community 
Mental  Health  Centers  Program.  Contract 
report  to  the  Office  of  the  Assistant  Sec- 
retary for  Planning  and  Evaluation,  U.S. 
Department  of  Health  and  Human  Services. 
Washington,  D.C:  the  Corporation,  Decem- 
ber 1980. 

Griffith,  J.R.  Measuring  Hospital  Perform- 
ance. Chicago,  III.:  Blue  Cross  Association, 
1978. 

Harris,  P.R.  Establishment  of  the  Operations 
Management  System  zmd  related  studies  of 
selected  agencies  and  programs.  Memoran- 


137 


dum  from  the  Secretary  of  Health,  Educa- 
tion, and  Welfare,  December  18,  1979. 

Horst,  P.;  Scanlon,  J.W.;  Schmidt,  R.E.;  and 
Wholey,  J.S.  Evaluation  Planning  at  the  Na- 
tional Institute  of  Mental  Health:  A Case 
History.  Washington,  D.C.:  The  Urban  In- 
stitute, 1974. 

Jewell,  M.;  Beyna,  L.;  Yates,  E.;  and  Walker, 
E.  Exploratory  Evaluation  of  the  Community 
Mental  Health  Centers  Program.  Washing- 
ton, D.C.:  U.S.  Department  of  Health,  Edu- 
cation, and  Welfare,  1980. 

Miller,  S.,  and  Wilson,  H.  The  case  for  per- 
formance contracting.  Administration  in 
Mental  Health  8:185-193,  1981. 

National  Institute  of  Mental  Health.  A guide- 
book to  the  1981  Operations  Management 
System  for  Federally  Funded  Community 
Mental  Health  Centers.  Rockville,  Md.: 


NIMH,  1981. 

Ridgway,  V.F.  Dysfunctional  consequences  of 
performance  measures.  Administrative 
Science  Quarterly  1:240-247,  1956. 

U.S.  General  Accoxmting  Office.  Evaluating  a 
Performance  Measurement  System--A  Guide 
for  the  Congress  and  Federal  Agencies. 
Washington,  D.C.:  GAO,  1980. 

Weidman,  D.R.;  Horst,  P.,*  Taher,  G.M.;  and 
Wholey,  J.S.  Design  of  an  Evaluation  System 
for  the  National  Institute  of  Mental  Health. 
Contract  report  by  The  Urban  Institute. 
Acquisition  No.  PB-221-180.  Springfield, 
Va.:  National  Technical  Information  Service, 
1973. 

Wholey,  J.S.;  Nay,  J.N.;  Scanlon,  J.W.;  and 
Schmidt,  R.E.  Evaluation:  When  is  it  really 
needed?  Evaluation  2(2):89-93,  1975. 


t^U.S.  GOVERNMENT  PRINTING  OFFICE:  1964  454  955  20048 


138 


■V  ■ ' # ■ i*-" 


* ^r*2^^CJ£jLiL  .!  ' • I ’ - /*  ■ ttl 

• 'wBKS®  5 ' • A^4?L<1'  »\  • ■ >•*  '^Ti  ’ ■ 


t !•  ' 1 ■'  i ' *.7^*' 


M^/r-r’>V‘ ''*  •??>■  ’ ' T 71^1  / ■ ■ •'  ^SlfSTt^'*'  . 

1.  i»..  ■':^^‘^“'t'*'^  . :■ i*' 'V>\  ,'■  /yi7'.  T ':'  ' 

'V-  " if '■  tiW#4  ”'  ■ "''T*'  ■ '■^•'  ■ ■ 


'M  -‘V.  '?t- 


-ti».<%J^>'  ' ■ ■ v^- 4.'  ' 


V 


|M. 


iL-fH  f'via  i,.t  W\ 

- yj*.  njw^vooi,i's«i^  «i«s»#ii<irt«t»j'i  cf. 

vtot;  r,r  . *.'41  i - 

,:  ^ .*iNyt'- '.  j ^ iH-i-rr-tj*  V-?^v  1^7, jii^ 

rft«4l  i ^ A«?kv«!4oc  1.^  • 

Hi)f;/ir)f  tVi  nw  ) iWw  JWrtJlf^^  /I  “ 

trtttUf.  ' ■ \ '[J  ' ' • *•  AiijF  |3/JW4*^f«'  fln<?  ^ttiiartd  Afif« 

)il^.  l.-'m<.  l^<ri1iM_^i>.  ili  >r4  Wtfkijf;  d W*>;At,ttxai.  £>.r.*  ^,A^a,  ‘ 

ft  !>■>  K^  ijuwyNl  0' iitS %h  ■ :.*#^«i|ii4 

• haui'j.*  ^:'4«iw4#  ^S.  E-<tu‘in  \.-f  im  BvqH*>  '■.  * ’^^jf4tm' 

tom  tlsw,?  M 'l  r?T  HUi*UJffu  i£tfvl»i' 


<T4ftifX  » 

wcw. 


M Hp«  o»*c  ti>r  i^iT. 


,i<j4tta>4Usn  ' *yrs, , 


v^;-  nfckto«*i  r«*^- 


lr.fcif|-^. 


,>^^'d^n4f<!ijm.t  M m- 

^hiH45Ai  ‘•■'•'A*  Wln.dJWjfi  J..%l  4sH-i  * 

Mk  ’■•  ,4^r,  f.>^9  v--f^  «v4Wii|^  ai-.«i|p 


‘.J^^#p-;  ‘-'i*  <7*31 


r«r'^**t  JjUfcZ" 


NIH  Library,  Building  10 
National  Institutes  of  Health 
iethesda*  Md.  20205  ^ 


LIBRARy 


Atnazlnfl  Research 
Amazing  Help. 

http://nihlibrary.nih.gov 


10  Center  Drive 
Bethesda,  MD  20892-1150 

301-496-1080 


LIBRARY 


DEPARTMENT  OF 
HEALTH  & HUMAN  SERVICES 

Public  Health  Service 
Alcohol,  Drug  Abuse,  and  Mental 
Health  Administration 
Rockville  MD  20857 


496  00132  6928 


,1 


JUUHH9eS-/Y 


official  Business 
Penalty  for  Private  Use  $300 


N'OV  I 6 


mi 


I 


05150  776471 

NIH  LIBP»RY  ITQUI  UNIT 

BLDG^’O, 

9000  ROCKVILLE  PIKE 
^Ethefd»  Mn 


2007  4 


NOTICE  OF  MAILING  CHANGE 

□ Check  here  if  you  wish  to  discontinue  recervlng  this  type  of  publication. 

□ Check  here  if  your  address  has  changed  and  you  wish  to  continue  receiving  this  type 
of  publication.  (Be  sure  to  furnish  your  complete  address  including  zip  code.) 

Tear  off  cover  with  address  label  and  publication  number  still  affixed  and  send  to: 
National  Institute  of  Mental  Health 
Science  Communication  Branch 
5600  Fishers  Lane  (Room  15-99) 

Rockville,  Maryland  20857 


DHHS  Publication  No.  (ADM)  84-1357 
Printed  1984 


